Java Regular Expression

A Regular Expression in Java defines a search pattern for texts (i.e. strings).

Whenever we try searching for some pattern in a text, we can use this “search pattern” to describe what we are looking for.

The search pattern can be a single character, or a more complex pattern.

A regular expression basically provides a means to perform all types of search, manipulation, and edit operations on texts.

It is also known as Regex which is the abbreviated form of Regular expression.

In Java, regular expressions are provided under java.util.regex package; so, it is the Java Regex API.

 

Regex API:–

Regex API consists of one interface and and three classes under java.util.regex package. They are namely —

  1. MatchResult interface – contains query methods used to determine the results of a match against a Regex pattern
  2. Matcher class – used to perform match operations on text using Regex patterns
  3. Pattern class – used to create Regex patterns
  4. PatternSyntaxException class – used to indicate syntax error in a Regex pattern

Matcher and Pattern are the core classes of Java Regex API.

The description of different members of Regex API are given below one-by-one.

 

1. MatchResult interface

This interface indicates the result of a match operation. Its signature is as follows:

public interface MatchResult

The interface basically contains query methods used to determine the results of a match against a regular expression. The methods of this interface will be seen in the implementing Matcher class.

 

2. Matcher class

This implements the MatchResult interface. It is basically a regex engine that performs match operations on a text or string by interpreting a Pattern instance. Basically, we can obtain a Matcher instance from a Pattern instance. Once created, a Matcher instance can be used to perform different kinds of match operations against a text multiple times.

Its signature is as follows:

public final class Matcher extends Object implements MatchResult

The important methods of this class are given below:

  • boolean matches() – It is used to test whether the regular expression matches the pattern.
  • boolean find() – It is mainly used for searching multiple occurrences of the regular expressions in the text.
  • boolean find (int start) – It is used for searching occurrences of the regular expressions in the text starting from the given index.
  • int start() – It is used for getting the start index of a match that is being found using find() method.
  • int end() – It is used to get the end index of a match that is being found using find() method. It returns index of character next to last matching character.
  • String group() – It is used to find the matched sub-sequence.
  • int groupCount() – It is used to find the total number of the matched sub-sequence.
  • String replaceAll (String replacement) – It replaces every sub-sequence of the input sequence that matches the pattern with the given replacement string
  • String replaceFirst (String replacement) – It replaces the first sub-sequence of the input sequence that matches the pattern with the given replacement string

NOTE: Since Matcher class implements MatchResult interface, they will have a common list of methods.

 

3. Pattern class

It represents the compiled version of a regular expression. It is used to create a pattern for the regex engine. After compilation, it’s instance can be used to create a Matcher object that can match texts or strings against the regular expression.

Its signature is as follows:

public final class Pattern extends Object implements Serializable

The important methods of this class are given below:

  • static Pattern compile (String regex) It is used to compile the given regular expression into a pattern.
  • static Pattern compile (String regex, int flags) It is used to compile the given regular expression into a pattern with the given flags.
  • Matcher matcher (CharSequence input) It is used to create a matcher that will match the given input against this pattern.
  • static boolean matches (String regex, CharSequence input) – It is used to compile the given regular expression and attempts to match the given input against it.
  • String pattern() It is used to return the regular expression from which this pattern was compiled.
  • String[] split (CharSequence input) It is used to split the given input sequence around matches of this pattern.

 

4. PatternSyntaxException class

It indicates unchecked exception thrown to indicate a syntax error in a Regex pattern. Its signature is as follows:

public class PatternSyntaxException extends IllegalArgumentException

The methods of this exception class are given below:

  • String getDescription() – It provides the description of the error.
  • int getIndex() – It provides the error index.
  • String getPattern() – It provides the the erroneous regular expression pattern.
  • String getMessage() – It returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.

 

A Simple Regular Expression:–

See the following example of Java Regex for quick reference. The program below will look for the pattern that will have “Sachin” or “Lara” in the sample text. This can be done by using the combination of Pattern and Matcher classes.

RegularExpTest.java

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegularExpTest {

public static void main(String[] args) {
// sample text
String text = "Undoubtedly, Sachin and Lara were the best batsmen in 1990s";
// regex pattern
String pattern = "Sachin|Lara";
Pattern p = Pattern.compile(pattern); //creating instance of Pattern
Matcher m = p.matcher(text); //creating instance of Matcher

while (m.find()) {
System.out.print("Start index: " + m.start());
System.out.print(" End index: " + m.end() + " ");
System.out.println(" - " + m.group());
}
}
}

Output:

Start index: 13 End index: 19 - Sachin
Start index: 24 End index: 28 - Lara

 

 

Rules for Defining Regular Expressions:–

The rules for defining regular expressions are based on the presence of character classes, meta characters and quantifiers. They will give Regex the usefulness of advanced expressive power. These rules are given below.

 

Regex Character Classes

Brackets are used to find a range of characters:

Expression

Description

[abc]

Find any character between the brackets (simple class)

[^abc]

Find any character not between the brackets (negation)

[0-9]

Find any character between the brackets (any digit)

[^0-9]

Find any character not between the brackets (any non-digit)

[a-zA-Z]

Find a through z or A through Z, inclusive (range)

[a-d[m-p]]

Find a through d, or m through p: [a-dm-p] (union)

[a-z&&[def]]

Find d, e, or f (intersection)

[a-z&&[^bc]]

Find a through z, except for b and c: [ad-z] (subtraction)

[a-z&&[^m-p]]

Find a through z, and not m through p: [a-lq-z](subtraction)

 

Regex Meta characters

Meta characters are characters with a special meaning:

Meta character

Description

.

Find any character (may or may not match terminator)

\d

Find any digits, short for [0-9]

\D

Find any non-digit, short for [^0-9]

\s

Find any whitespace character, short for [\t\n\x0B\f\r]

\S

Find any non-whitespace character, short for [^\s]

\w

Find any word character, short for [a-zA-Z_0-9]

\W

Find any non-word character, short for [^\w]

\b

Find a word boundary

\B

Find a non-word boundary

^

Find the beginning of a line

$

Find the end of a line

X|Y

Find X or Y

XY

Find X followed by Y

NOTE: To use meta characters as ordinary characters in regular expressions we have to precede the meta character with a backslash (\).

 

Regex Quantifiers

These are the quantifiers used in Java Regex.

Quantifier

Description

X?

Matches any string that contains zero or one occurrences of X

X+

Matches any string that contains at least one X

X*

Matches any string that contains zero or more occurrences of X

X{n}

Matches any string that contains a sequence of n X‘s

X{n, m}

Matches any string that contains a sequence of n to m X‘s

X{n,}

Matches any string that contains a sequence of at least n X‘s

 
Specifying modes inside the Regex

We can specify the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).

  • (?i) makes the regex case insensitive.

  • (?s) for “single line mode” makes the dot match all characters, including line breaks.

  • (?m) for “multi-line mode” makes the caret and dollar match at the start and end of each line in the subject string.

 
Backslashes in Regex

The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. We have to use double backslash \\ to define a single backslash. If we want to define \w, then we must be using \\w in our regex. If we want to use backslash as a literal, we have to type \\\\ as \ is also an escape character in regular expressions.

 

 

Regex Coding Examples

The following coding examples will clear the basic concept of Regex in Java.

RegularExpTest01.java

// This program searches for Regex patterns in the sample text
import
java.util.regex.Pattern
;

public class RegularExpTest01 {

public static void main(String[] args) {
//sample text
String text = "Hi from TechGuruSpeaks! Welcome to techguruspeaks.com";

// Search for the pattern "TechGuruSpeaks" in the sampletext
String pattern1 = ".*TechGuruSpeaks.*";
boolean result1 = Pattern.matches(pattern1, text);
System.out.println("Result1: " + result1); //true

// Search for the pattern "techGuruSpeaks" in the sample text
String pattern2 = ".*techGuruSpeaks.*";
boolean result2 = Pattern.matches(pattern2, text);
System.out.println("Result2: " + result2); //false

// Search for the pattern "techguruspeaks" in the sample text text
String pattern3 = ".*techguruspeaks.*";
boolean result3 = Pattern.matches(pattern3, text);
System.out.println("Result3: " + result3); //true
}
}

Output:

Result1: true
Result2: false
Result3: true

 

RegularExpTest02.java

// This program also searches for Regex patterns in the sample text
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegularExpTest02 {

public static void main(String[] args) {
// sample text
String text = "www.techguruspeaks.com is for techies";

// creates a Pattern instance
Pattern p = Pattern.compile(".*tech.*"); //Regex pattern 1

// creates a Matcher instance
Matcher m = p.matcher(text);

// matches() checks if the whole text matches with a pattern or not
boolean result = m.matches();
System.out.println("Result: " + result); //true

// creates another Pattern instance
p = Pattern.compile("tech"); //Regex pattern 2

// creates another Matcher instance
m = p.matcher(text);

// find() used to discover multiple occurrences of a pattern in text
while (m.find())
System.out.println("Pattern found from " + m.start() + " to " + (m.end()-1));
}
}

Output:

Result: true
Pattern found from 4 to 7
Pattern found from 30 to 33

 

RegularExpTest03.java

// This program performs split and replace operations
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegularExpTest03  {

public static void main(String[] args) {
// sample text
String text = "My SITE is techguruspeaks.com. This SiTe is for all techies.";

// expression to be compiled
String regex = "site";

// using split() method of Pattern class
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
String[] array = p.split(text);
System.out.println("Split operation:~");
System.out.println("================");
for (String arr : array) {
System.out.print(arr);
}
System.out.println("\nNumber of split strings: " + array.length);

// using replaceFirst() and replaceAll() methods of Matcher class
p = Pattern.compile("1*2");
Matcher m = p.matcher("11234512678");
System.out.println("\nReplace operation:~");
System.out.println("===================");
System.out.println("Using replaceAll: " + m.replaceAll("#"));
System.out.println("Using replaceFirst: " + m.replaceFirst("#"));
}
}

Output:

Split operation:~
================
My is techguruspeaks.com. This is for all techies.
Number of split strings: 3

Replace operation:~
==================
Using replaceAll: _345_678
Using replaceFirst: _34512678

 

RegularExpTest04.java

import java.util.regex.Pattern;

public class RegularExpTest04  {

public static void main(String[] args) {
// It returns true if text matches exactly "techguru"
System.out.println("Result01 : " + Pattern.matches("techguru", "Techguru")); //false

// It returns true if text matches exactly "techguru" or "Techguru"
System.out.println("Result02 : " + Pattern.matches("[Tt]echguru", "techguru")); //true
System.out.println("Result03 : " + Pattern.matches("[Tt]echguru", "Techguru")); //true

// It returns true if text matches exactly "site" or "Site" or "kite" or "Kite"
System.out.println("Result04 : " + Pattern.matches("[sS]ite|[kK]ite", "Site")); //true
System.out.println("Result05 : " + Pattern.matches("[sS]ite|[kK]ite", "kite")); //true

// It returns true if the text contains "guru" at any place
System.out.println("Result06 : " + Pattern.matches(".*guru.*", "techguru")); //true

// It returns true if the text does not have a number at the beginning
System.out.println("Result07 : " + Pattern.matches("^[^\\d].*", "789xyz")); //false
System.out.println("Result08 : " + Pattern.matches("^[^\\d].*", "xyz789")); //true

// It returns true if the text contains of three letters
System.out.println("Result09 : " + Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "xYz")); //true
System.out.println("Result10 : " + Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "pQR")); //true
System.out.println("Result11 : " + Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "bpKm")); //false

// It returns true if the text contains 0 or more non-digits
System.out.println("Result12 : " + Pattern.matches("\\D*", "pqrst")); //true
System.out.println("Result13 : " + Pattern.matches("\\D*", "pqrst789")); //false

// Boundary Matchers example ^ denotes start and $ denotes end of a line
System.out.println("Result14 : " + Pattern.matches("^It$", "It is from TechGuru")); //false
System.out.println("Result15 : " + Pattern.matches("^It$", "It")); //true
System.out.println("Result16 : " + Pattern.matches("^It$", "Is It from TechGuru")); //false

// Returns true if text contains Capturing groups(multiple chars as a single unit)
// like "a3a3" or "ABB2B2AB"

System.out.println("Result17 : " + Pattern.matches("(\\w\\d)\\1", "a3a3")); //true
System.out.println("Result18 : " + Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println("Result19 : " + Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println("Result20 : " + Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false

}
}

Output:

Result01 : false
Result02 : true
Result03 : true
Result04 : true
Result05 : true
Result06 : true
Result07 : false
Result08 : true
Result09 : true
Result10 : true
Result11 : false
Result12 : true
Result13 : false
Result14 : false
Result15 : true
Result16 : false
Result17 : true
Result18 : false
Result19 : true
Result20 : false

 

RegularExpTest05.java

import java.util.regex.Pattern;

public class RegularExpTest05  {

public static void main(String[] args) {
// Test 1
boolean result1 = Pattern.matches("[0-9]{7}", "1234567"); //true
System.out.println("Result1: " + result1);

// Test 2
boolean result2 = Pattern.matches("[0-9]{5,7}", "123456"); //true
System.out.println("Result2: " + result2);

// Test 3
boolean result3 = Pattern.matches("[a-zA-Z0-9]{9}", "sys123tem"); //true
System.out.println("Result3: " + result3);

// Test 4
boolean result4 = Pattern.matches("[a-zA-Z0-9]{9}", "system1234"); //false
System.out.println("Result4: " + result4);

// Test 5
boolean result5 = Pattern.matches("[a-zA-Z0-9]{10}", "system2020"); //true
System.out.println("Result5: " + result5);

// Test 6
boolean result6 = Pattern.matches("[a-zA-Z0-9]{10}", "system#786"); //false
System.out.println("Result6: " + result6);

// Pattern for 10-digit phone number starting with 7, 8 or 9
String pattern = "[789]{1}\\d{9}";

// Test 7
String text = "9650384569";
System.out.println("Result7: " + Pattern.matches(pattern, text)); //true

// Test 8
text = "3650384569";
System.out.println("Result8: " + Pattern.matches(pattern, text)); //false

// Test 9
text = "7850354569";
System.out.println("Result9: " + Pattern.matches(pattern, text)); //true
}
}

Output:

Result1: true
Result2: true
Result3: true
Result4: false
Result5: true
Result6: false
Result7: true
Result8: false
Result9: true