Home > Java Core > Regular Expression in Java

Regular Expression in Java

Java

Java provides the java.util.regex package for pattern matching with regular expressions. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search or edit or manipulate data.

The java.util.regex package primarily consists of the following three classes:

  • Pattern
  • Matcher
  • PatternSyntaxException

Regular Expression Syntax:

^		Matches beginning of line.
$		Matches end of line.
.		Matches any single character except newline. Using m option allows it to match newline as well.
[...]		Matches any single character in brackets.
[^...]		Matches any single character not in brackets
\A		Beginning of entire string
\z		End of entire string
\Z		End of entire string except allowable final line terminator.
re*		Matches 0 or more occurrences of preceding expression.
re+		Matches 1 or more of the previous thing
re?		Matches 0 or 1 occurrence of preceding expression.
re{ n}		Matches exactly n number of occurrences of preceding expression.
re{ n,}		Matches n or more occurrences of preceding expression.
re{ n, m}	Matches at least n and at most m occurrences of preceding expression.
a| b		Matches either a or b.
(re)		Groups regular expressions and remembers matched text.
(?: re)		Groups regular expressions without remembering matched text.
(?> re)		Matches independent pattern without backtracking.
\w		Matches word characters.
\W		Matches nonword characters.
\s		Matches whitespace. Equivalent to [\t\n\r\f].
\S		Matches nonwhitespace.
\d		Matches digits. Equivalent to [0-9].
\D		Matches nondigits.
\A		Matches beginning of string.
\Z		Matches end of string. If a newline exists, it 	Matches just before newline.
\z		Matches end of string.
\G		Matches point where last match finished.
\n		Back-reference to capture group number "n"
\b		Matches word boundaries when outside brackets. 	Matches backspace (0x08) when inside brackets.
\B		Matches nonword boundaries.
\n, \t, etc.	Matches newlines, carriage returns, tabs, etc.
\Q		Escape (quote) all characters up to \E
\E		Ends quoting begun with \Q

Matcher Class:
Here is a list of useful instance methods:

public int start()
Returns the start index of the previous match.

public int start(int group)
Returns the start index of the subsequence captured by the given group during the previous match operation.

public int end()
Returns the offset after the last character matched.

public int end(int group)
Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

public boolean lookingAt() 
Attempts to match the input sequence, starting at the beginning of the region, against the pattern.

public boolean find() 
Attempts to find the next subsequence of the input sequence that matches the pattern.

public boolean find(int start)
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.

public boolean matches() 
Attempts to match the entire region against the pattern.

public Matcher appendReplacement(StringBuffer sb, String replacement)
Implements a non-terminal append-and-replace step.

public StringBuffer appendTail(StringBuffer sb)
Implements a terminal append-and-replace step.

public String replaceAll(String replacement) 
Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.

public String replaceFirst(String replacement)
Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.

public static String quoteReplacement(String s)
Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class.

Extract numbers from String

package com.jkoder;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExtract {
	public static void main(String[] args) {
		Pattern p = Pattern.compile("\\d+");
		Matcher m = p.matcher("hello world2 345 is againg908908"); 
		while (m.find()) {
			System.out.println(Integer.parseInt(m.group()));
		}
	}
}

Output

2
345
908908

Counts the number of times given word appeared in the input String

package com.jkoder;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExtract {
	public static void main(String args[]) {
		String REGEX = "\\bworld\\b";
		String INPUT = "hello world hello world, my world";
		Pattern p = Pattern.compile(REGEX);
		Matcher m = p.matcher(INPUT); // get a matcher object
		int count = 0;

		while (m.find()) {
			count++;
			System.out.println("Match " + count);
			System.out.println("Match Word starts from position at : " + m.start() + " To "+m.end());
		}
	}
}

Output

Match 1
Match Word starts from position at : 6 To 11
Match 2
Match Word starts from position at : 18 To 23
Match 3
Match Word starts from position at : 28 To 33

Replace all the matching word from the input String

package com.jkoder;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExtract {
	private static String REGEX = "world";
	private static String INPUT = "hello world, hello world, hello world.";
	private static String REPLACE = "universe";

	public static void main(String[] args) {
		Pattern p = Pattern.compile(REGEX);
		Matcher m = p.matcher(INPUT);
		INPUT = m.replaceAll(REPLACE);
		System.out.println(INPUT);
	}
}

Output

hello universe, hello universe, hello universe.