Oracle Regular Expression
Regular expressions allow you to perform powerful context searches in variable-length strings. They provide a a powerful set of pattern matching capabilities by combining the following
Character Classes | are groups of possible characters at a point in the search |
Collation Classes | are sets of characters and are treated like a range |
Metacharacters | are operators that specify search algorithms |
Metasequences | are operators created by two metacharacters or literals |
Literals | are characters, character sets and words |
Character Classes
Character classes are groups or ranges of possible characters, they are traditionally delimited by square brackets [], You use a dash "-" to specifiy a range i.e [1-9]. You can use the posix character class inside the brackets [:alpha:] see below for more examples
[:alnum:] |
All alphanumeric characters |
[:alpha:] |
All alphabetic characters |
[:cntrl:] |
All non-printable control characters |
[:digit:] |
All numeric digits |
[:graph:] |
all [:digit:], [:lower:], [:punct:] and [:upper:] portable character class |
[:lower:] |
All lower alphabetic characters |
[:print:] |
All printable characters |
[:punct:] |
All punctuation characters |
[:space:] |
All nonprinting space characters |
[:upper:] |
All upper alphabetic characters |
[:xdigit:] |
All hexidecimal characters |
Now for an example
Character class example | DECLARE counter NUMBER := 1; source_string VARCHAR2(12) := 'A1'; pattern1 VARCHAR2(12) := '[[:alpha:]]'; -- use the posix character class pattern2 VARCHAR2(12) := '[[:alnum:]]'; BEGIN -- Compare using standard character class ranges. FOR i IN 1..LENGTH(source_string) |
Collation Classes
This class is new and is designed to allow you to collate languages that require a collating element. You define a collation class by using [..]
Collation class example | [a-[.ch.]] Note: allows you to find element that is between an a or a ch |
Metacharacters
A metacharacter provides some mechanics for performing pattern matching, lots of programming and scripting languages use the same syntax.
Metacharacter | Name |
Type |
Description |
() |
parentheses |
Delimiter |
act as a constraint on the scope of comparsion |
{m} |
exact |
Interval |
matches exactly m occurences of the preceding subexpression or character |
{m, } |
at least |
Interval |
matches at least m occurences of the preceding subexpression or character |
{m,n} |
between |
Interval |
matches exactly m occurences but no more than n of the preceding subexpression or character |
| |
OR |
Logical |
acts as a logical OR operator |
. |
dot |
Matching |
matches any one character |
^ |
caret |
Matching |
matches the beginning of a line |
$ |
dollar |
Matching |
matches the end of a line |
[^] |
caret |
Negation |
negates when used inside square brackets (like the NOT operator) |
- |
dash |
Range |
specifies a range when inside square brackets |
? |
question mark |
Repetition |
makes the proceding character optional |
* |
asterisk |
Repetition |
matches any instance of zero to many characters |
+ |
plus |
Repetition |
matches at least once or many times the preceding character. |
The best way to learn these is to google regular expression on the web, as i said before many programming languages and scripting languages use metacharacters.
Metasequences
Those of you who have used Perl before will under the metasequences, they are a character combined with a backslash to represent a pattern or characters.
Metasequence | Name |
Type |
Description |
\n |
backreference |
Posix |
matches the nth preceding subexpression |
\d |
digit |
Perl |
matches any digit |
\D |
nondigit |
Perl |
matches any non-digit |
\w |
word character |
Perl |
matches any word characters |
\W |
nonword character |
Perl |
matches any non-word characters |
\s |
whitespace character |
Perl |
matches any whitespace character |
\S |
nonwhitespace character |
Perl |
matches any non-whitespace character |
\A |
beginning of a string |
Perl |
matches a beginning of a string |
\Z |
end of a sting |
Perl |
matches the end of a string |
\z |
end of a string |
Perl |
matches the end of a string |
Literals
Literals values are simply string literals, they may consist of one or many characters.
Oracle 11g Regular Expression
Oracle 11g has implemented a number of regular expression functions, so far there are 5 in total
Regular Expression Functions |
|
REGEXP_COUNT | lets you count the number of times a specific pattern is found in a string |
REGEXP_INSTR | lets you find a position index value in a string |
REGEXP_LIKE | lets you find a regular expression match inside a string, it's like the LIKE operator |
REGEXP_REPLACE | lets you find and replace a substring inside of a string |
REGEXP_SUBSTR | lets you find a substring inside a string |
Match Type Flags |
|
i |
sets the search to case-insensitive matching, overriding the nls_sort parameter |
c |
sets the search to case-sensitive matching, overriding the nls_sort parameter |
n |
Enables the dot (.) to truly match any character, including the newline |
m |
Enables a search to recognise multiple lines inside a string |
x |
sets the search to ignore any whitespace characters |
Examples |
|
REGEXP_COUNT | DECLARE mystring varchar2(24) := 'abcdefABCDEFabcdefABCDEF'; mycount number; BEGIN -- match a patterm - should result in 2 select regexp_count(mystring, 'abcdef') into mycount from dual; dbms_output.put_line('Total Matches: ' || mycount); -- match a single pattern starting from position 7 - should result in 1 -- match a pattern but ignore case - should result in 4 |
REGEXP_INSTR | create table employees ( select regexp_instr(emp_name,'Valle',1,1,0,'i') as Begin |
REGEXP_LIKE | create table employees ( select emp_id from employees where regexp_like(emp_name, 'Valle'); |
REGEXP_REPLACE | create table employees ( select regexp_replace(emp_name, 'Paul', 'Lorraine',1) from employees; |
REGEXP_SUBSTR | create table employees ( select regexp_substr(emp_name, 'V[:alpha:]+e',1) from employees; -- using a character class |