A StringTokenizer Function in PL/SQL

In this post, I will show two simple ways to create a StringTokenizer function in PL/SQL. One way is to use regular expressions using the functions REGEXP_SUBSTR and REGEXP_INSTR provided by Oracle. The Functions REGEXP_SUBSTR and REGEXP_INSTR are merely an extension of their function INSTR and SUBSTR in Oracle. The STRINGTOKENIZER_REGEX function, which you can see below, it behaves like the class StringTokenizer in Java, that’s by removing all the tokens that are NULL. The result will be a DBMS_SQL.varchar2_table with all valid tokens. Let’s see, now, the code:

CREATE OR REPLACE
FUNCTION STRINGTOKENIZER_REGEX
    (p_string IN LONG, p_separators IN VARCHAR2)
RETURN DBMS_SQL.varchar2_table
IS
    l_token_tbl DBMS_SQL.varchar2_table;
    pattern     VARCHAR2(250);
    param_clob  CLOB;
BEGIN
    param_clob := TO_CLOB(p_string);
    pattern := '[^(' || p_separators || ')]+' ;
        SELECT   REGEXP_SUBSTR (param_clob, pattern, 1, LEVEL) token
          BULK   COLLECT
          INTO   l_token_tbl
          FROM   DUAL
         WHERE   REGEXP_SUBSTR (param_clob, pattern, 1, LEVEL) IS NOT NULL
    CONNECT BY   REGEXP_INSTR (param_clob, pattern, 1, LEVEL) > 0;
    RETURN l_token_tbl;
EXCEPTION
    WHEN OTHERS THEN
        dbms_output.put_line('STRINGTOKENIZER_REGEX - Error '||TO_CHAR(SQLCODE)||': '||SQLERRM);
        dbms_output.put_line(DBMS_UTILITY.format_error_backtrace);
        RAISE;
END STRINGTOKENIZER_REGEX;
/

You can do the same thing, given above, without using regular expressions. In fact, you can use directly INSTR and SUBSTR functions of Oracle. In this case, the PL/SQL STRINGTOKENIZER function, see below, always returns a DBMS_SQL.varchar2_table but with also the tokens equals to NULL. See the code below:

CREATE OR REPLACE
FUNCTION STRINGTOKENIZER
    (p_string IN LONG, p_separators IN VARCHAR2)
RETURN DBMS_SQL.varchar2_table
IS
    l_token_tbl DBMS_SQL.varchar2_table;
    l_start NUMBER;
    l_pos NUMBER;
    l_index NUMBER;
BEGIN
    l_start := 1;
    l_pos :=1;
    l_index:=1;
    WHILE TRUE LOOP
        l_pos := INSTR (p_string ,p_separators ,l_start);
        IF l_start > LENGTH(p_string) THEN
        EXIT;
        END IF;
        IF l_pos = 0 THEN
        l_token_tbl(l_index):=Substr(p_string ,l_start, LENGTH(p_string)+1-l_start);
        EXIT;
        END IF;
        l_token_tbl(l_index):=Substr(p_string ,l_start, l_pos-l_start);
        l_start:=l_pos+LENGTH(p_separators);
        l_index:=l_index+1;
    END LOOP;
    RETURN l_token_tbl;
EXCEPTION
    WHEN OTHERS THEN
        dbms_output.put_line('STRINGTOKENIZER - Error '||TO_CHAR(SQLCODE)||': '||SQLERRM);
        dbms_output.put_line(DBMS_UTILITY.format_error_backtrace);
        RAISE;
END STRINGTOKENIZER;
/

Now we write a PL/SQL script to test our function:

DECLARE
ret DBMS_SQL.VARCHAR2_TABLE;
BEGIN
-- Now call the stored program
  ret := STRINGTOKENIZER('aaa|bbb|ccc|ddd|eeee|','|');
-- Output the results
IF ret IS NOT NULL THEN
 IF ret.count > 0 THEN
   FOR i IN ret.first..ret.last LOOP
     IF ret.exists(i) THEN
       null; -- type of data not known
dbms_output.put_line(SubStr('ret('||TO_CHAR(i)||') = '||ret(i),1,255));
      END IF;
    END LOOP;
  END IF;
 END IF;
EXCEPTION
WHEN OTHERS THEN
  dbms_output.put_line(SubStr('Error '||TO_CHAR(SQLCODE)||': '||SQLERRM, 1, 255));
  dbms_output.put_line(dbms_utility.format_error_backtrace);
RAISE;
END;

The above code tests only the function STRINGTOKENIZER, but it’s easy to modify. Finally, both functions seem to work perfectly. The only difference, as already described, is regarding the output of token NULL. However, there is another difference, that is on performance. In fact, the use of regular expressions functions on very long strings is slower than using directly strings function.

0 comments ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment