Using Oracle10g's Regexp_substr function in place of the traditional Substr function
By John Adolph Palinski
Oracle has added a new function similar to the Substr function in it new 10g database version. The function allows you to search and extract string characters using POSIX and Unicode notation. This function is called Regexp_substr and extracts any characters from a target string that match the notation parameter. The function has two parameters: the target character string and the notation describing the candidate characters. The following is a syntax template of the function:
Regexp_substr(target string, evaluation notation)
Historically, developers have used the Substr function to extract characters. For example, assume a developer wants to extract the year from a date. Dates are normally displayed in the "DD-MON-YY" format. The Substr function has three parameters: the target expression, the beginning location of the extraction, and the number of characters to be extracted. In order to extract the proper characters the developer must identify the numeric location of the characters. The following is the Substr setting to extract the year from the previous date format:
Substr(sysdate, 8, 2)
The previous setting is adequate for the default date format. It will extract the year. What happens if the format shows the century and is in the "DD-MON-YYYY" format? In this format the century is displayed in the in position 8, not the year. The Substr function will now display the century unless the position parameter is changed. This would result in an incorrect value. The Regexp_substr function can overcome the changing format problem. The function can be set up to always extract the proper characters, regardless of their position. The Regexp_substr function allows the developer to base extraction on anchor points, sequences of digits, or a host of other descriptions. If it can be described using the POSIX and Unicode notation it can be described.
Listing 1 contains a Select statement that uses the Substr and Regexp_substr functions to extract characters from a date. The Substr function is used to extract two characters beginning at position 8 from two different formatted dates. The first Substr function returned the year and the second function returned the century. The Regexp_substr function is then used against the same formatted dates. Rather than using the position, the function uses the end anchor tag to extract the last two characters. In both cases, the year was extracted.
Listing 1 - Extracting the year using the Substr and Regexp_substr functions
SQL> Select substr('08-APR-51', 8, 2), substr('08-APR-1951', 8, 2),
2 regexp_substr('08-APR-51', '..$'), regexp_substr('08-APR-1951', '..$')
3 from dual;
SU SU RE RE
-- -- -- --
51 19 51 51
SQL>
In Listing 1, POSIX notation was used to specify that the last two characters of the target string are to be extracted. The notation consists of two periods (..) and a dollar sign ($). The dollar sign is an end of string anchor notation. It signifies the end of the string. A period is a notation for a wild-card character. It tells the function to accept any character in this position. In effect the '..$' notation used in the Regexp_substr function applies to the last two characters of the string and only to the last two characters. The function extracts any set of characters that match the notation, thus any two characters at the end of the target string will match. Since the year is always the last two characters in a date, the year will be extracted regardless of the length.
The POSIX and Unicode notation has a wealth of tests and symbols. You can test for alpha, digit, or special characters. You can also test for repeating characters and even for alternate characters. Listing 2 is a final example of the function's use to extract the last name from a full name expression. The function uses the following notation: '^[[:alpha]]*'. The notation has means the following:
- ^ Beginning anchor tag. Tells the function to begin evaluating with the first character in the string.
- [[:alpha:]] Character class representing alpha characters (i.e. A, B, C)
- * Repeat indefinitely symbol
The notation means to select or extract alpha characters beginning with the first character in the string. Continue extracting alpha characters until a non-alpha character is reached. If you look at Listing 2, you can see that the target expression has a comma following the last name. The Regexp_substr extracts characters until it reaches the comma. The result is the last name.
Listing 2 - Using the Regexp_substr function to extract the first set of alpha characters
SQL> select full_name,
2 regexp_substr(full_name, '^[[:alpha:]]*') "Last Name"
3 from full_names;
FULL_NAME Last Name
------------------------------------------ ------------------------------------------
WASHINGTON, GEORGE WASHINGTON
ADAMS, JOHN ADAMS
ROSS, BETSY ROSS
JEFFERSON, THOMAS JEFFERSON
..
BUCHANAN, JAMES BUCHANAN
LINCOLN, ABRAHAM LINCOLN
JOHNSON, ANDREW JOHNSON
20 rows selected.
SQL>
I hope that you can see the power of this new Oracle10g function. You can find a list of the POSIX and Unicode characters in my article "Using Oracle10g's Regexp_like evaluation operator in place of the traditional Like operator". It was published on May 4, 2004 on the SQL Server Worldwide User's Group web site (http://www.sswug.org/see/18302).
John Adolph Palinski is a former adjunct faculty member at the University of Nebraska at Omaha and Iowa Western Community College. He is the author of the "Oracle Database Construction Kit", Que, 1997, the "Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, Developers, and Business Analysts", Addison Wesley Longman, 2002, and "Oracle 9i Developer: Developing Web Applications With Forms Builder", Course Technology, 2003. He has also written numerous articles for periodicals such as Oracle Internals and Oracle Professional. He provides Oracle training and consulting services. One of his passions is to help customers set up logical business models to improve information efficiency and effectiveness. He may be contacted at www.oracle-trainer.com.