Oracle - Regular Expressions

Regular expressions allow you to perform powerful context searches in variable-length strings. They provide a a powerful set of pattern matching capabilities by combining the following

Character Classes	are groups of possible characters at a point in the search
Collation Classes	are sets of characters and are treated like a range
Metacharacters	are operators that specify search algorithms
Metasequences	are operators created by two metacharacters or literals
Literals	are characters, character sets and words

Character Classes

Character classes are groups or ranges of possible characters, they are traditionally delimited by square brackets [], You use a dash "-" to specifiy a range i.e [1-9]. You can use the posix character class inside the brackets [:alpha:] see below for more examples

[:alnum:]	All alphanumeric characters
[:alpha:]	All alphabetic characters
[:cntrl:]	All non-printable control characters
[:digit:]	All numeric digits
[:graph:]	all [:digit:], [:lower:], [:punct:] and [:upper:] portable character class
[:lower:]	All lower alphabetic characters
[:print:]	All printable characters
[:punct:]	All punctuation characters
[:space:]	All nonprinting space characters
[:upper:]	All upper alphabetic characters
[:xdigit:]	All hexidecimal characters

Now for an example

Character class example

DECLARE
  counter       NUMBER := 1;
  source_string VARCHAR2(12) := 'A1';
  pattern1      VARCHAR2(12) := '[[:alpha:]]';       -- use the posix character class
  pattern2      VARCHAR2(12) := '[[:alnum:]]';
BEGIN
  -- Compare using standard character class ranges.
  FOR i IN 1..LENGTH(source_string) 
  LOOP
    IF REGEXP_INSTR(SUBSTR(source_string,counter,i),pattern1) = i 
    THEN
      dbms_output.put(REGEXP_SUBSTR(
                        SUBSTR(source_string,counter,i),pattern1));
    ELSE
      dbms_output.put_line(REGEXP_SUBSTR(
                             SUBSTR(source_string,counter,i),pattern2));
    END IF;
    counter := counter + 1;
  END LOOP;
END;
/

Collation Classes

This class is new and is designed to allow you to collate languages that require a collating element. You define a collation class by using [..]

Collation class example

[a-[.ch.]]

Note: allows you to find element that is between an a or a ch

Metacharacters

A metacharacter provides some mechanics for performing pattern matching, lots of programming and scripting languages use the same syntax.

Metacharacter	Name	Type	Description
()	parentheses	Delimiter	act as a constraint on the scope of comparsion
{m}	exact	Interval	matches exactly m occurences of the preceding subexpression or character
{m, }	at least	Interval	matches at least m occurences of the preceding subexpression or character
{m,n}	between	Interval	matches exactly m occurences but no more than n of the preceding subexpression or character
\|	OR	Logical	acts as a logical OR operator
.	dot	Matching	matches any one character
^	caret	Matching	matches the beginning of a line
$	dollar	Matching	matches the end of a line
[^]	caret	Negation	negates when used inside square brackets (like the NOT operator)
-	dash	Range	specifies a range when inside square brackets
?	question mark	Repetition	makes the proceding character optional
*	asterisk	Repetition	matches any instance of zero to many characters
+	plus	Repetition	matches at least once or many times the preceding character.

The best way to learn these is to google regular expression on the web, as i said before many programming languages and scripting languages use metacharacters.

Metasequences

Those of you who have used Perl before will under the metasequences, they are a character combined with a backslash to represent a pattern or characters.

Metasequence	Name	Type	Description
\n	backreference	Posix	matches the nth preceding subexpression
\d	digit	Perl	matches any digit
\D	nondigit	Perl	matches any non-digit
\w	word character	Perl	matches any word characters
\W	nonword character	Perl	matches any non-word characters
\s	whitespace character	Perl	matches any whitespace character
\S	nonwhitespace character	Perl	matches any non-whitespace character
\A	beginning of a string	Perl	matches a beginning of a string
\Z	end of a sting	Perl	matches the end of a string
\z	end of a string	Perl	matches the end of a string

Literals

Literals values are simply string literals, they may consist of one or many characters.

Oracle 11g Regular Expression

Oracle 11g has implemented a number of regular expression functions, so far there are 5 in total

Regular Expression Functions
REGEXP_COUNT	lets you count the number of times a specific pattern is found in a string
REGEXP_INSTR	lets you find a position index value in a string
REGEXP_LIKE	lets you find a regular expression match inside a string, it's like the LIKE operator
REGEXP_REPLACE	lets you find and replace a substring inside of a string
REGEXP_SUBSTR	lets you find a substring inside a string
Match Type Flags
i	sets the search to case-insensitive matching, overriding the nls_sort parameter
c	sets the search to case-sensitive matching, overriding the nls_sort parameter
n	Enables the dot (.) to truly match any character, including the newline
m	Enables a search to recognise multiple lines inside a string
x	sets the search to ignore any whitespace characters
Examples
REGEXP_COUNT	DECLARE mystring varchar2(24) := 'abcdefABCDEFabcdefABCDEF'; mycount number; BEGIN -- match a patterm - should result in 2 select regexp_count(mystring, 'abcdef') into mycount from dual; dbms_output.put_line('Total Matches: ' \|\| mycount); -- match a single pattern starting from position 7 - should result in 1 select regexp_count(mystring, 'abcdef',7) into mycount from dual; dbms_output.put_line('Total Matches: ' \|\| mycount); -- match a pattern but ignore case - should result in 4 select regexp_count(mystring, 'abcdef',1,'i') into mycount from dual; dbms_output.put_line('Total Matches: ' \|\| mycount); END; /
REGEXP_INSTR	create table employees ( emp_id number, emp_name varchar2(30) ); insert into employees values (1, 'Paul Valle'); select regexp_instr(emp_name,'Valle',1,1,0,'i') as Begin regexp_instr(emp_name,'Valle',1,1,1,'i') - 1 as End from employees
REGEXP_LIKE	create table employees ( emp_id number, emp_name varchar2(30) ); insert into employees values (1, 'Paul Valle'); select emp_id from employees where regexp_like(emp_name, 'Valle');
REGEXP_REPLACE	create table employees ( emp_id number, emp_name varchar2(30) ); insert into employees values (1, 'Paul Valle'); select regexp_replace(emp_name, 'Paul', 'Lorraine',1) from employees;
REGEXP_SUBSTR	create table employees ( emp_id number, emp_name varchar2(30) ); insert into employees values (1, 'Paul Valle'); select regexp_substr(emp_name, 'V[:alpha:]+e',1) from employees; -- using a character class