Java supports regular expressions like many other programming languages, the syntax for regular expressions across all languages are very simila, below is a table that lists the special characters used for regular expressions.
Special Characters |
|
| . character | matches any character except the newline character, the special combination of .* tries to match as much as possible. |
| + character | means one or more of the preceding characters |
| [ ] character | enable you to define patterns that match one of a group of alternatives, you can also uses ranges such as [0-9] or [a-z,A-Z] |
| * character | match zero or more occurrences of the preceding character |
| ? character | match zero or one occurrence of the preceding character |
| Pattern anchor | there are a number of pattern anchors, match at beginning of a string (^ or \A), match at the end of a string ($ or \Z), match on word boundary (\b) and match inside a work (\B - opposite of \b) |
| Escape sequence | if you want to include a character that is normally treated as a special character, you must precede the character with a backslash, you can use the \Q to tell perl to treat everything after as a normal character until it see's \E |
| Excluding | you can exclude words or characters by using the ^ inside square brackets [^] |
| Character-Range escape sequences | there are special character range escape sequences such as any digit (\d), anything other than a digit (\D) |
| Specified number of occurrences | you can define how any occurrences you want to match using the {<minimum>,<maximum>} |
| specify choice | the special character | (pipe) enables you to specify two or more alternatives to choose from when matching a pattern |
| Portition reuse | some times you want to store what has been matched, you can do this by using (), the first set will be store in \1 (used in pattern matching) or $1 (used when assigning to variables) , the second set \2 or $2 and so on. |
| Different delimiter | you can specify a different delimiter |
Special Characters Examples |
|
| . character | d.f # could match words like def, dif, duf d.*f # could match words like deaf, deef, def, dzzf, etc |
| + character | de+f # could match words like def, deef, deeef, deeeef, etc + # match words between multiple spaces |
| [ ] character | d[eE]f # match words def or dEf d[a-z]f # match words like def, def, dzf, dsf, etc |
| * character | de*f # match words like df, def, deef, deeef, etc |
| ? character | de?f # match only the words df and def (not deef only matches one occurence) |
| Pattern anchors | ^hello # match only if line starts with hello \Bdef # matches abcdef (opposite of \b) |
| Escape sequence | \+salary # will match the word +salary, the + (plus) is treated as a normal character because of the \ \Q**++\E # will match **++ |
| Excluding | d[^eE]f # 1st character is d, 2nd character is anything other than e or E, last character is f |
| Character-Range escape sequences | \d # match any digit \d+ # match any number of digits |
| Specified number of occurrences | de{3}f # match only deeeef the {3} means three preceding e's de{1,3} # match only deef, deeef and deeeef ( minimum = 1, maximum = 3 occurrences) |
| specify choice | def|ghi # match either def or ghi |
The simplest way to learn Java and regular expressions is to show some examples as they are not very difficult to learn
| Basic Example | Pattern pat;
Matcher mat;
boolean found;
pat = Pattern.compile("Java");
mat = pat.matcher("Java");
found = mat.matches(); // check for a match
System.out.println("Testing Java against Java.");
if(found) System.out.println("Matches");
else System.out.println("No Match");
System.out.println();
System.out.println("Testing Java against Java 8.");
mat = pat.matcher("Java 8"); // create a new matcher
found = mat.matches(); // check for a match
if(found) System.out.println("Matches");
else System.out.println("No Match"); |
| Using a qualifier | Pattern pat = Pattern.compile("W+");
Matcher mat = pat.matcher("W WW WWW");
while(mat.find())
System.out.println("Match: " + mat.group()); |
| Using a wildcard and qualifier | Pattern pat = Pattern.compile("e.+d");
Matcher mat = pat.matcher("extend cup end table");
while(mat.find())
System.out.println("Match: " + mat.group()); |
| Using the ? qualifier | // Use reluctant matching behavoir.
Pattern pat = Pattern.compile("e.+?d");
Matcher mat = pat.matcher("extend cup end table");
while(mat.find())
System.out.println("Match: " + mat.group()); |
| Using replaceAll() | String str = "Jon Jonathan Frank Ken Todd";
Pattern pat = Pattern.compile("Jon.*? ");
Matcher mat = pat.matcher(str);
System.out.println("Original sequence: " + str);
str = mat.replaceAll("Eric ");
System.out.println("Modified sequence: " + str); |
| Using split() | // Match lowercase words.
Pattern pat = Pattern.compile("[ ,.!]");
String strs[] = pat.split("one two,alpha9 12!done.");
for(int i=0; i < strs.length; i++)
System.out.println("Next token: " + strs[i]); |