Oracle Regular Expression

Regular expressions allow you to perform powerful context searches in variable-length strings. They provide a a powerful set of pattern matching capabilities by combining the following

Character Classes are groups of possible characters at a point in the search
Collation Classes are sets of characters and are treated like a range
Metacharacters are operators that specify search algorithms
Metasequences are operators created by two metacharacters or literals
Literals are characters, character sets and words

Character Classes

Character classes are groups or ranges of possible characters, they are traditionally delimited by square brackets [], You use a dash "-" to specifiy a range i.e [1-9]. You can use the posix character class inside the brackets [:alpha:] see below for more examples

[:alnum:]
All alphanumeric characters
[:alpha:]
All alphabetic characters
[:cntrl:]
All non-printable control characters
[:digit:]
All numeric digits
[:graph:]
all [:digit:], [:lower:], [:punct:] and [:upper:] portable character class
[:lower:]
All lower alphabetic characters
[:print:]
All printable characters
[:punct:]
All punctuation characters
[:space:]
All nonprinting space characters
[:upper:]
All upper alphabetic characters
[:xdigit:]
All hexidecimal characters

Now for an example

Character class example
DECLARE
  counter       NUMBER := 1;
  source_string VARCHAR2(12) := 'A1';
  pattern1      VARCHAR2(12) := '[[:alpha:]]';       -- use the posix character class
  pattern2      VARCHAR2(12) := '[[:alnum:]]';
BEGIN
  -- Compare using standard character class ranges.
  FOR i IN 1..LENGTH(source_string) 
  LOOP IF REGEXP_INSTR(SUBSTR(source_string,counter,i),pattern1) = i
    THEN dbms_output.put(REGEXP_SUBSTR( SUBSTR(source_string,counter,i),pattern1)); ELSE dbms_output.put_line(REGEXP_SUBSTR( SUBSTR(source_string,counter,i),pattern2)); END IF; counter := counter + 1; END LOOP; END; /

Collation Classes

This class is new and is designed to allow you to collate languages that require a collating element. You define a collation class by using [..]

Collation class example

[a-[.ch.]]

Note: allows you to find element that is between an a or a ch

Metacharacters

A metacharacter provides some mechanics for performing pattern matching, lots of programming and scripting languages use the same syntax.

Metacharacter
Name
Type
Description
()
parentheses
Delimiter
act as a constraint on the scope of comparsion
{m}
exact
Interval
matches exactly m occurences of the preceding subexpression or character
{m, }
at least
Interval
matches at least m occurences of the preceding subexpression or character
{m,n}
between
Interval
matches exactly m occurences but no more than n of the preceding subexpression or character
|
OR
Logical
acts as a logical OR operator
.
dot
Matching
matches any one character
^
caret
Matching
matches the beginning of a line
$
dollar
Matching
matches the end of a line
[^]
caret
Negation
negates when used inside square brackets (like the NOT operator)
-
dash
Range
specifies a range when inside square brackets
?
question mark
Repetition
makes the proceding character optional
*
asterisk
Repetition
matches any instance of zero to many characters
+
plus
Repetition
matches at least once or many times the preceding character.

The best way to learn these is to google regular expression on the web, as i said before many programming languages and scripting languages use metacharacters.

Metasequences

Those of you who have used Perl before will under the metasequences, they are a character combined with a backslash to represent a pattern or characters.

Metasequence
Name
Type
Description
\n
backreference
Posix
matches the nth preceding subexpression
\d
digit
Perl
matches any digit
\D
nondigit
Perl
matches any non-digit
\w
word character
Perl
matches any word characters
\W
nonword character
Perl
matches any non-word characters
\s
whitespace character
Perl
matches any whitespace character
\S
nonwhitespace character
Perl
matches any non-whitespace character
\A
beginning of a string
Perl
matches a beginning of a string
\Z
end of a sting
Perl
matches the end of a string
\z
end of a string
Perl
matches the end of a string

Literals

Literals values are simply string literals, they may consist of one or many characters.

Oracle 11g Regular Expression

Oracle 11g has implemented a number of regular expression functions, so far there are 5 in total

Regular Expression Functions
REGEXP_COUNT lets you count the number of times a specific pattern is found in a string
REGEXP_INSTR lets you find a position index value in a string
REGEXP_LIKE lets you find a regular expression match inside a string, it's like the LIKE operator
REGEXP_REPLACE lets you find and replace a substring inside of a string
REGEXP_SUBSTR lets you find a substring inside a string
Match Type Flags
i
sets the search to case-insensitive matching, overriding the nls_sort parameter
c
sets the search to case-sensitive matching, overriding the nls_sort parameter
n
Enables the dot (.) to truly match any character, including the newline
m
Enables a search to recognise multiple lines inside a string
x
sets the search to ignore any whitespace characters
Examples
REGEXP_COUNT DECLARE
   mystring varchar2(24) := 'abcdefABCDEFabcdefABCDEF';
   mycount number;
BEGIN
   -- match a patterm - should result in 2
   select regexp_count(mystring, 'abcdef') into mycount from dual;
   dbms_output.put_line('Total Matches: ' || mycount);

   -- match a single pattern starting from position 7 - should result in 1
   select regexp_count(mystring, 'abcdef',7) into mycount from dual;
   dbms_output.put_line('Total Matches: ' || mycount);

   -- match a pattern but ignore case - should result in 4
   select regexp_count(mystring, 'abcdef',1,'i') into mycount from dual;
   dbms_output.put_line('Total Matches: ' || mycount);
END;
/

REGEXP_INSTR

create table employees (
   emp_id number,
   emp_name varchar2(30)
);

insert into employees values (1, 'Paul Valle');

select regexp_instr(emp_name,'Valle',1,1,0,'i') as Begin
    regexp_instr(emp_name,'Valle',1,1,1,'i') - 1 as End
from employees

REGEXP_LIKE

create table employees (
   emp_id number,
   emp_name varchar2(30)
);

insert into employees values (1, 'Paul Valle');

select emp_id from employees where regexp_like(emp_name, 'Valle');

REGEXP_REPLACE

create table employees (
   emp_id number,
   emp_name varchar2(30)
);

insert into employees values (1, 'Paul Valle');

select regexp_replace(emp_name, 'Paul', 'Lorraine',1) from employees;

REGEXP_SUBSTR

create table employees (
   emp_id number,
   emp_name varchar2(30)
);

insert into employees values (1, 'Paul Valle');

select regexp_substr(emp_name, 'V[:alpha:]+e',1) from employees;   -- using a character class