Java正则表达式及Pattern与Matcher使用详解

周学义

2023-12-01

一、正则表达式详解

正则表达式是由普通字符（如英文字母）以及特殊字符（也称为元字符）组成的文字模式。该模式对文本查找时需要匹配的一个或多个字符串描述，给出一个匹配模板。它专门用于操作字符串，可以简化对字符串的复杂操作。

1、符号定义

（1）基本书写符号

符号	符号	示例	解释	匹配输入
\	转义符	\*	符号“*”	*
[ ]	可接收的字符列表	[efgh]	e、f、g、h中的任意1个字符	e、f、g、h
[^]	不接收的字符列表	[^abc]	除a、b、c之外的任意1个字符，包括数字和特殊符号	m、q、5、*
\|	匹配“\|”之前或之后的表达式	ab\|cd	ab或者cd	ab、cd
( )	将子表达式分组	(abc)	将字符串abc作为一组	abc
-	连字符	A-Z	任意单个大写字母	大写字母

（2）限定符

限定符将可选数量的数据添加到正则表达式，下表为常用限定符：

符号	含义	示例	示例
*	指定字符重复0次或n次	(abc)*	仅包含任意个abc的字符串，等效于\w*
+	指定字符重复1次或n次	m+(abc)*	以至少1个m开头，后接任意个abc的字符串
?	指定字符重复0次或1次	m+abc?	以至少1个m开头，后接ab或abc的字符串
{n}	只能输入n个字符	[abcd]{3}	由abcd中字母组成的任意长度为3的字符串
{n,}	指定至少 n 个匹配	[abcd]{3,}	由abcd中字母组成的任意长度不小于3的字符串
{n,m}	指定至少 n 个但不多于 m 个匹配	[abcd]{3,5}	由abcd中字母组成的任意长度不小于3，不大于5的字符串
^	指定起始字符	^[0-9]+[a-z]*	以至少1个数字开头，后接任意个小写字母的字符串
$	指定结束字符	^[0-9]\-[a-z]+$	以1个数字开头后接连字符“–”，并以至少1个小写字母结尾的字符串

（3）匹配字符集

匹配字符集是预定义的用于正则表达式中的符号集。如果字符串与字符集中的任何一个字符相匹配，它就会找到这个匹配项。

正则表达式中的部分匹配字符集:

符号	含义	示例	示例
.	匹配除 \n 以外的任何字符	a…b	以a开头，b结尾，中间包括2个任意字符的长度为4的字符串
\d	匹配单个数字字符，相当于[0-9]	\d{3}(\d)?	包含3个或4个数字的字符串
\D	匹配单个非数字字符，相当于[^0-9]	\D(\d)*	以单个非数字字符开头，后接任意个数字字符串
\w	匹配单个数字、大小写字母字符，相当于[0-9a-zA-Z]	\d{3}\w{4}	以3个数字字符开头的长度为7的数字字母字符串
\W	匹配单个非数字、大小写字母字符，相当于[^0-9a-zA-Z]	\W+\d{2}	以至少1个非数字字母字符开头，2个数字字符结尾的字符串

（4）分组构造

常用分组构造形式：

常用分组构造形式	说明
()	非命名捕获。捕获匹配的子字符串（或非捕获组）。编号为零的第一个捕获是由整个正则表达式模式匹配的文本，其它捕获结果则根据左括号的顺序从1开始自动编号。
(?<name>)	命名捕获。将匹配的子字符串捕获到一个组名称或编号名称中。用于name的字符串不能包含任何标点符号，并且不能以数字开头。可以使用单引号替代尖括号，例如 (?‘name’)

（5）字符转义

如果你想查找元字符本身的话，比如你查找.,或者*,就出现了问题：你没办法指定它们，因为它们会被解释成别的意思。这时你就得使用\来取消这些字符的特殊意义。因此，你应该使用.和*。当然，要查找\本身，你也得用\\
例如：deerchao\.net匹配deerchao.NET，C:\\Windows匹配C:\\Windows。注意在Java中: (https://github\\.com/[\\w\\-]) 用"\\.“配备”."。

2、常用正则表达式举例

非负整数：“^\d+$ ”
正整数： “ ^[0-9]*[1-9][0-9]*$” 
非正整数： “ ^((-\d+)|(0+))$” 
整数： “ ^-?\d+$” 
英文字符串： “ ^[A-Za-z]+$” 
英文字符数字串： “ ^[A-Za-z0-9]+$” 
英数字加下划线串： “^\w+$” 
E-mail地址：“^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$” 
URL：“^[a-zA-Z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\s*)?$” 

Java 常用正则表达式（数字，字符串处理）
匹配特定数字
^[1-9]d*$　 　 //匹配正整数
^-[1-9]d*$ 　 //匹配负整数
^-?[1-9]d*$　　 //匹配整数
^[1-9]d*|0$　 //匹配非负整数（正整数 + 0）
^-[1-9]d*|0$　　 //匹配非正整数（负整数 + 0）
^[1-9]d*.d*|0.d*[1-9]d*$　　 //匹配正浮点数
^-([1-9]d*.d*|0.d*[1-9]d*)$　 //匹配负浮点数
^-?([1-9]d*.d*|0.d*[1-9]d*|0?.0+|0)$　 //匹配浮点数
^[1-9]d*.d*|0.d*[1-9]d*|0?.0+|0$　　 //匹配非负浮点数（正浮点数 + 0）
^(-([1-9]d*.d*|0.d*[1-9]d*))|0?.0+|0$　　//匹配非正浮点数（负浮点数 + 0）

匹配特定字符串
^[A-Za-z]+$　　//匹配由26个英文字母组成的字符串
^[A-Z]+$　　//匹配由26个英文字母的大写组成的字符串
^[a-z]+$　　//匹配由26个英文字母的小写组成的字符串
^[A-Za-z0-9]+$　　//匹配由数字和26个英文字母组成的字符串
^w+$　　//匹配由数字、26个英文字母或者下划线组成的字符串

验证Email地址：“^w+[-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*$”
验证InternetURL：“^http://([w-]+.)+[w-]+(/[w-./?%&=]*)?$”
验证电话号码：“^((d{3,4})|d{3,4}-)?d{7,8}$”
验证身份证号（15位或18位数字）：“^d{15}|d{}18$”
验证一年的12个月：“^(0?[1-9]|1[0-2])$”正确格式为：“01”-“09”和“1”“12”
验证一个月的31天：“^((0?[1-9])|((1|2)[0-9])|30|31)$”
匹配中文字符的正则表达式： [u4e00-u9fa5]
匹配双字节字符(包括汉字在内)：[^x00-xff]
匹配空行的正则表达式：n[s| ]*r
匹配HTML标记的正则表达式：/< (.*)>.*|< (.*) />/
匹配首尾空格的正则表达式：(^s*)|(s*$)
匹配Email地址的正则表达式：w+([-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*
匹配网址URL的正则表达式：http://([w-]+.)+[w-]+(/[w- ./?%&=]*)?

Java这些常用的正则表达式在处理大数据查找，更新，替换的时候可以极大的提高效率。

3、Java中RegularExpressionValidator用正则表达式校验

Java使用RegularExpressionValidator验证控件时，它的验证功能及其验证正则表达式如下：

只能输入数字：“^[0-9]*$”
只能输入n位的数字：“^d{n}$”
只能输入至少n位数字：“^d{n,}$”
只能输入m-n位的数字：“^d{m,n}$”
只能输入零和非零开头的数字：“^(0|[1-9][0-9]*)$”
只能输入有两位小数的正实数：“^[0-9]+(.[0-9]{2})?$”
只能输入有1-3位小数的正实数：“^[0-9]+(.[0-9]{1,3})?$”
只能输入非零的正整数：“^+?[1-9][0-9]*$”
只能输入非零的负整数：“^-[1-9][0-9]*$”
只能输入长度为3的字符：“^.{3}$”
只能输入由26个英文字母组成的字符串：“^[A-Za-z]+$”
只能输入由26个大写英文字母组成的字符串：“^[A-Z]+$”
只能输入由26个小写英文字母组成的字符串：“^[a-z]+$”
只能输入由数字和26个英文字母组成的字符串：“^[A-Za-z0-9]+$”
只能输入由数字、26个英文字母或者下划线组成的字符串：“^w+$”


验证用户密码:“^[a-zA-Z]w{5,17}$”正确格式为：以字母开头，长度在6-18之间，只能包含字符、数字和下划线。
只能输入汉字：“^[u4e00-u9fa5],{0,}$”
验证Email地址：“^w+[-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*$”
验证InternetURL：“^http://([w-]+.)+[w-]+(/[w-./?%&=]*)?$”
验证电话号码：“^((d{3,4})|d{3,4}-)?d{7,8}$”
验证身份证号（15位或18位数字）：“^d{15}|d{}18$”
验证一年的12个月：“^(0?[1-9]|1[0-2])$”正确格式为：“01”-“09”和“1”“12”
验证一个月的31天：“^((0?[1-9])|((1|2)[0-9])|30|31)$”


匹配中文字符的正则表达式： [u4e00-u9fa5]
匹配双字节字符(包括汉字在内)：[^x00-xff]
匹配空行的正则表达式：n[s| ]*r
匹配HTML标记的正则表达式：/< (.*)>.*|< (.*) />/
匹配首尾空格的正则表达式：(^s*)|(s*$)
匹配Email地址的正则表达式：w+([-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*
匹配网址URL的正则表达式：http://([w-]+.)+[w-]+(/[w- ./?%&=]*)?

4、正则表达式匹配简单语法汇总

1、字母：匹配单个字母
A：表示匹配字母A；
\\：匹配转义字符“\”；
\t：匹配转义字符“\t”；
\n：匹配转义字符“\n”；

2、一组字符：任意匹配里面的一个单个字符
[abc]：表示可能是字母a，可能是字母b或者是字母c；
[^abc]：表示不是字母a，字母b，字母c的任意一个；
[a-zA-Z]：表示全部字母中的任意一个；
[0-9]：表示全部数字的任意一个；

3、边界匹配
^：表示一组正则的开始；
$：表示一组正则的结束；

4、简写表达式：每一位出现的简写标记也只表示一位
· ：表示任意的一位字符；
\d ：表示任意的一位数字，等价于“[0-9]”；
\D ：表示任意的一位非数字，等价于“[~0-9]”；
\w ：表示任意的一位字母、数字、_，等价于“[a-zA-Z0-9_]”；
\w ：表示任意的一位非字母、数字、_，等价于“[^a-zA-Z0-9_]”；
\s ：表示任意的一位空格，例如：\n、\t等；
\S ：表示任意的一位非空格；
5、数量表示：之前所有的正则都只是表示一位，如果要表示多位，则就需要数量表示。
？：此正则出现0次或1次；
*：此正则出现0次、1次或多次；
+：次正则出现1次或多次；
{n}：此正则出现正好n次；
{n,}：此正则出现n次以上；
{n,m}：此正则出现n – m次。
6、逻辑表示：与、或、非
正则表达式A正则表达式B： 表达式A之后紧跟着表达式B；
正则表达式|A正则表达式B： 表示表达式A或者表达式B，二者任选一个出现；
（正则表达式）：将多个子表达式合成一个表示，作为一组出现。

二、Pattern类详解

Pattern在java.util.regex包中，是正则表达式的编译表示形式，此类的实例是不可变的，可供多个并发线程安全使用。

1、获取Pattern实例

Pattern的构造器被设计为私有，不允许通过new的方式创建Pattern。

private Pattern(String p, int f) {
    pattern = p;
    flags = f;

    // to use UNICODE_CASE if UNICODE_CHARACTER_CLASS present
    if ((flags & UNICODE_CHARACTER_CLASS) != 0)
        flags |= UNICODE_CASE;

    // Reset group index count
    capturingGroupCount = 1;
    localCount = 0;

    if (pattern.length() > 0) {
        compile();
    } else {
        root = new Start(lastAccept);
        matchRoot = lastAccept;
    }
}

要想获取Pattern的实例，可以使用其静态方法获取，将给定正则表达式编译为具有给定标志的模式。
参数: regex -要编译的表达式
flags -匹配标志，位掩码，可以包括CASE_INSENSITIVE, MULTILINE, DOTALL, UNICODE_CASE, CANON_EQ, UNIX_LINES, LITERAL, UNICODE_CHARACTER_CLASS和COMMENTS


public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}
public static Pattern compile(String regex, int flags) {
    return new Pattern(regex, flags);
}

（1）实例

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.matches();//返回false,因为bb不能被\d+匹配,导致整个字符串匹配未成功. 
Matcher m2=p.matcher("2223"); 
m2.matches();//返回true,因为\d+匹配到了整个字符串 

m=p.matcher("22bb23"); 
m.find();//返回true 
m2=p.matcher("aa2223"); 
m2.find();//返回true 
Matcher m3=p.matcher("aa2223bb"); 
m3.find();//返回true 
Matcher m4=p.matcher("aabb"); 
m4.find();//返回false 

m=p.matcher("22bb23"); 
m.lookingAt();//返回true,因为\d+匹配到了前面的22 
m2=p.matcher("aa2223"); 
m2.lookingAt();//返回false,因为\d+不能匹配前面的aa 

m=p.matcher("aaa2223bb"); 
m.find();//匹配2223 
m.start();//返回3 
m.end();//返回7,返回的是2223后的索引号 
m.group();//返回2223

Pattern p=Pattern.compile("([a-z]+)(\\d+)"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find();   //匹配aaa2223 
m.groupCount();   //返回2,因为有2组 
m.start(1);   //返回0 返回第一组匹配到的子字符串在字符串中的索引号 
m.start(2);   //返回3 
m.end(1);   //返回3 返回第一组匹配到的子字符串的最后一个字符在字符串中的索引位置. 
m.end(2);   //返回7 
m.group(1);   //返回aaa,返回第一组匹配到的子字符串 
m.group(2);   //返回2223,返回第二组匹配到的子字符串

2、组和捕获

捕获组可以通过从左到右计算其开括号来编号。

在表达式 ((A)(B©)) 中，存在四个组：
1 ABC
2 A
3 BC
4 C

组零始终代表整个表达式。

3、int flags()方法

返回当前Pattern的匹配flag参数。

Pattern p = Pattern.compile("a+", Pattern.CASE_INSENSITIVE);
System.out.println(p.flags());// 2

4、String pattern() 方法

返回该Patter对象所编译的正则表达式。

Pattern p = Pattern.compile("\\d+");
System.out.println(p.toString());// 输出\d+
System.out.println(p.pattern());// 输出\d+

5、String[] split(CharSequence input)方法

input 要拆分的字符序列。
return 根据围绕此模式的匹配来拆分输入后所计算的字符串数组。
此方法将目标字符串按照Pattern里所包含的正则表达式为模进行分割，它的工作方式类似于使用给定的输入序列和限制参数零调用两个参数的方法。因此，得到的数组中不包括尾部空字符串。

Pattern p=Pattern.compile("\\d+"); 
String[] str=p.split("我的QQ是:456456我的电话是:0532214我的邮箱是:aaa@aaa.com");

运行结果：

str[0]="我的QQ是:" str[1]="我的电话是:" str[2]="我的邮箱是:aaa@aaa.com"

6、String[] split(CharSequence input, int limit)方法

input 要拆分的字符序列。
limit 结果阈值，如上文中所述。
return 根据围绕此模式的匹配来拆分输入后所计算的字符串数组。

limit参数控制应用模式的次数，从而影响结果数组的长度。
1.如果 n 大于零，那么模式至多应用 n- 1 次，数组的长度不大于 n，并且数组的最后条目将包含除最后的匹配定界符之外的所有输入。
2.如果 n 非正，那么将应用模式的次数不受限制，并且数组可以为任意长度。
3.如果 n 为零，那么应用模式的次数不受限制，数组可以为任意长度，并且将丢弃尾部空字符串。

此方法返回的数组包含输入序列的子字符串，由匹配此模式的另一子序列或输入序列的结尾终止。数组中子字符串的顺序与其在输入中出现的顺序相同。如果此模式与输入的任何子序列都不匹配，那么得到的数组仅包含一个元素，即字符串形式的输入序列。

Pattern p = Pattern.compile("[/]+"); 
string[] result = p.split("Kevin has seen《LEON》seveal times,because it is a good film./ 凯文已经看过《这个杀手不太冷》几次了，因为它是一部好电影。/名词:凯文。"
	, 2);

执行结果：

Kevin has seen《LEON》seveal times,because it is a good film. 
凯文已经看过《这个杀手不太冷》几次了，因为它是一部好电影。/名词:凯文。

（1）实例

public static void main(String[] args) {
    String[] arr = null;
    CharSequence input = "boo:and:foo";
    Pattern p = Pattern.compile("o");
    arr = p.split(input, -2);
    System.out.println(printArr(arr)); // {"b","",":and:f","",""}，共有5个元素
    arr = p.split(input, 2);
    System.out.println(printArr(arr)); // {"b","o:and:foo"}，共有2个元素
    arr = p.split(input, 7);
    System.out.println(printArr(arr)); // {"b","",":and:f","",""}，共有5个元素
    arr = p.split(input, 0);
    System.out.println(printArr(arr)); // {"b","",":and:f"}，共有3个元素
}

// 打印String数组
public static String printArr(String[] arr) {
    int length = arr.length;
    StringBuffer sb = new StringBuffer();
    sb.append("{");
    for (int i = 0; i < length; i++) {
        sb.append("\"").append(arr[i]).append("\"");
        if (i != length - 1) sb.append(",");
    }
    sb.append("}").append("，共有" + length + "个元素");
    return sb.toString();
}

1、当limit=-2时，应用模式的次数不受限制且数组可以为任意长度；推测模式应用4次，数组的长度为5，数组为{“b”,“”,“:and:f”,“”,“”}。
2、当limit=2时，模式至多应用1次，数组的长度不大于 2，且第二个元素包含除最后的匹配定界符之外的所有输入；推测模式应用1次，数组的长度为2，数组为{“b”,“o:and:foo”}。
3、当limit=7时，模式至多应用6次，数组的长度不大于 7；推测模式应用4次，数组的长度为5，数组为{“b”,“”,“:and:f”,“”,“”}。
4、当limit=0时，应用模式的次数不受限制，数组可以为任意长度，并且将丢弃尾部空字符串；推测模式应用4次，数组的长度为3，数组为{“b”,“”,“:and:f”}。

7、Pattern.matches(String regex,CharSequence input)方法

此方法是一个静态方法,用于快速匹配字符串,该方法适合用于只匹配一次,且匹配全部字符串。

Pattern.matches("\\d+","2223");//返回true 
Pattern.matches("\\d+","2223aa");//返回false,需要匹配到所有字符串才能返回true,这里aa不能匹配到 
Pattern.matches("\\d+","22bb23");//返回false,需要匹配到所有字符串才能返回true,这里bb不能匹配到

8、Matcher matcher(CharSequence input)方法

返回一个Matcher对象，Pattern类只能做一些简单的匹配操作,要想得到更强更便捷的正则匹配操作,那就需要将Pattern与Matcher一起合作.Matcher类提供了对正则表达式的分组支持,以及对正则表达式的多次匹配支持。

Matcher类的构造方法也是私有的,不能随意创建,只能通过Pattern.matcher(CharSequence input)方法得到该类的实例。

Pattern.matches(String regex,CharSequence input),它与下面这段代码等价 Pattern.compile(regex).matcher(input).matches() 。

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.pattern();//返回p 也就是返回该Matcher对象是由哪个Pattern对象的创建的

（1）实例

Pattern p=Pattern.compile("([a-z]+)(\\d+)"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find();   //匹配aaa2223 
m.groupCount();   //返回2,因为有2组 
m.start(1);   //返回0 返回第一组匹配到的子字符串在字符串中的索引号 
m.start(2);   //返回3 
m.end(1);   //返回3 返回第一组匹配到的子字符串的最后一个字符在字符串中的索引位置. 
m.end(2);   //返回7 
m.group(1);   //返回aaa,返回第一组匹配到的子字符串 
m.group(2);   //返回2223,返回第二组匹配到的子字符串

三、Matcher类详解

Matcher对象是一个状态机器，它依据Pattern对象做为匹配模式对字符串展开匹配检查，此类的实例用于多个并发线程是不安全的。

一个Matcher实例是被用来对目标字符串进行基于既有模式（也就是一个给定的Pattern所编译的正则表达式）进行匹配查找的，所有往Matcher的输入都是通过CharSequence接口提供的，这样做的目的在于可以支持对从多元化的数据源所提供的数据进行匹配工作。

1、获取Matcher实例

Matcher类的构造方法也是私有的,不能随意创建,只能通过Pattern.matcher(CharSequence input)方法得到该类的实例。

Matcher(Pattern parent, CharSequence text) {
    this.parentPattern = parent;
    this.text = text;

    // Allocate state storage
    int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
    groups = new int[parentGroupCount * 2];
    locals = new int[parent.localCount];

    // Put fields into initial states
    reset();
}

（1）实例

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.pattern();//返回p 也就是返回该Matcher对象是由哪个Pattern对象的创建的

2、String toString()方法

Pattern p = Pattern.compile("(\\w+)%(\\d+)");
Matcher m = p.matcher("ab%12-cd%34");
System.out.println(m.toString());

执行结果：

java.util.regex.Matcher[pattern=(\w+)%(\d+) region=0,11 lastmatch=]

3、Matcher reset()方法

reset()方法改变了变量first 、last 、oldLast、lastAppendPosition、from、to的值并将数组groups、locals初始化。

Pattern p = Pattern.compile("(\\w+)%(\\d+)");
Matcher m = p.matcher("ab%12-cd%34");
if (m.find()) {
    System.out.println("开始索引：" + m.start()); // 开始索引：0
    System.out.println("group():" + m.group()); // group():ab%12
}
if (m.find()) {
    System.out.println("开始索引：" + m.start()); // 开始索引：6
    System.out.println("group():" + m.group()); // group():cd%34
}

4、Matcher reset(CharSequence input)方法

Pattern p = Pattern.compile("(\\w+)%(\\d+)");
Matcher m = p.matcher("ab%12-cd%34");
m.reset("ef%56-gh%78");
while (m.find()) {
    System.out.println("group():" + m.group());
}

执行结果：

group():ef%56
group():gh%78

5、Pattern pattern()方法

pattern()返回parentPattern，返回由此匹配器解释的模式，即构造器传入的Pattern对象。

6、int groupCount()方法

返回此匹配器模式中的捕获组数。根据惯例，零组表示整个模式。

Pattern p = Pattern.compile("(\\w+)%(\\d+)");
Matcher m = p.matcher("ab%12-cd%34");
System.out.println(m.groupCount());// 2

7、String group()

返回当前查找而获得的与组匹配的所有子串内容。group()实际调用了group(int group)方法，参数group为0。组零表示整个模式。

8、String group(int group)

返回当前查找而获得的与组匹配的所有子串内容。

9、int start()

返回当前匹配的子串的第一个字符在目标字符串中的索引位置。start()方法返回的是匹配器的状态first。

10、int start(int group)

返回当前匹配的指定组中的子串的第一个字符在目标字符串中的索引位置。

11、int end()

返回当前匹配的子串的最后一个字符的下一个位置在目标字符串中的索引位置。end()方法返回的是匹配器的状态last。

12、int end(int group)

返回当前匹配的的指定组中的子串的最后一个字符的下一个位置在目标字符串中的索引位置。

13、boolean find()

在目标字符串里查找下一个匹配子串。如果匹配成功，则可以通过 start、end 和 group 方法获取更多信息。

find()是部分匹配，从当前位置开始匹配，找到一个匹配的子串，将移动下次匹配的位置。

find()从匹配器区域的开头开始，如果该方法的前一次调用成功了并且从那时开始匹配器没有被重置，则从以前匹配操作没有匹配的第一个字符开始。如果匹配成功，则可以通过 start、end 和 group 方法获取更多信息。

（1）实例

// 查找替换指定字符串 
Pattern p = Pattern.compile(expression); // 正则表达式 
Matcher m = p.matcher(text); // 操作的字符串 
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    m.appendReplacement(sb, str);
    i++; //字符串出现次数 
}
m.appendTail(sb); //从截取点将后面的字符串接上 
String s = sb.toString();

// 查找输出字符串
Pattern p = Pattern.compile(expression); // 正则表达式 
Matcher m = p.matcher(text); // 操作的字符串 
while (m.find()) {
    matcher.start();
    matcher.end();
    matcher.group(1);
}

// 匹配字符串
Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.find();//返回true 
Matcher m2=p.matcher("aa2223"); 
m2.find();//返回true 
Matcher m3=p.matcher("aa2223bb"); 
m3.find();//返回true 
Matcher m4=p.matcher("aabb"); 
m4.find();//返回false

14、boolean find(int start)

重置此匹配器，然后尝试查找匹配该模式，从指定的位置开始查找下一个匹配的子串。如果匹配成功，则可以通过 start、end 和 group 方法获取更多信息。

15、int regionStart()

报告此匹配器区域的开始索引。end()方法返回的是匹配器的状态from。

16、int regionEnd()

报告此匹配器区域的结束索引（不包括）。end()方法返回的是匹配器的状态to。

17、Matcher region(int start,int end)

设置此匹配器的区域限制。重置匹配器，然后设置区域，使其从 start 参数指定的索引开始，到 end 参数指定的索引结束（不包括end索引处的字符）。

18、boolean lookingAt()

从目标字符串开始位置进行匹配。只有在有匹配且匹配的某一子串中包含目标字符串第一个字符的情况下才会返回true。

lookingAt是部分匹配，总是从第一个字符进行匹配,匹配成功了不再继续匹配，匹配失败了,也不继续匹配。

（1）实例

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.lookingAt();//返回true,因为\d+匹配到了前面的22 
Matcher m2=p.matcher("aa2223"); 
m2.lookingAt();//返回false,因为\d+不能匹配前面的aa

public static void main(String arg[]) {
    String string = "123-34345-234-00";
    Pattern pattern = Pattern.compile("\\d{3,5}");
    Matcher matcher = pattern.matcher(string);

    boolean result1 = matcher.matches();
    System.out.println("result1=" + result1);

    //重置匹配器。
    matcher.reset();

    //尝试查找与该模式匹配的输入序列的下一个子序列。
    boolean result2 = matcher.find();
    System.out.println("result2=" + result2);
    boolean result3 = matcher.find();
    System.out.println("result3=" + result3);
    boolean result4 = matcher.find();
    System.out.println("result4=" + result4);
    boolean result5 = matcher.find();
    System.out.println("result5=" + result5);

    //尝试将从区域开头开始的输入序列与该模式匹配。
    boolean result6 = matcher.lookingAt();
    System.out.println("result6=" + result6);
    boolean result7 = matcher.lookingAt();
    System.out.println("result7=" + result7);
    boolean result8 = matcher.lookingAt();
    System.out.println("result8=" + result8);
}
/*
运行结果：

result1=false
result2=true
result3=true
result4=true
result5=false
result6=true
result7=true
result8=true
*/

public class MatcherTest {
    public static void main(String[] args){
        Pattern pattern = Pattern.compile("\\d{3,5}");
        String charSequence = "123-34345-234-00";
        Matcher matcher = pattern.matcher(charSequence);
 
        //虽然匹配失败，但由于charSequence里面的"123"和pattern是匹配的,所以下次的匹配从位置4开始
        print(matcher.matches());
        //测试匹配位置
        matcher.find();
        print(matcher.start());
 
        //使用reset方法重置匹配位置
        matcher.reset();
 
        //第一次find匹配以及匹配的目标和匹配的起始位置
        print(matcher.find());
        print(matcher.group()+" - "+matcher.start());
        //第二次find匹配以及匹配的目标和匹配的起始位置
        print(matcher.find());
        print(matcher.group()+" - "+matcher.start());
 
        //第一次lookingAt匹配以及匹配的目标和匹配的起始位置
        print(matcher.lookingAt());
        print(matcher.group()+" - "+matcher.start());
 
        //第二次lookingAt匹配以及匹配的目标和匹配的起始位置
        print(matcher.lookingAt());
        print(matcher.group()+" - "+matcher.start());
    }
    public static void print(Object o){
        System.out.println(o);
    }
}

/*
执行结果：

false
true
- 0
true
- 4
true
- 0
true
- 0
*/

19、boolean matches()

对整个字符串进行匹配,只有整个字符串都匹配了才返回true。

Pattern.matcher(String regex,CharSequence input),它与下面这段代码等价 Pattern.compile(regex).matcher(input).matches()

（1）实例

Pattern p = Pattern.compile(expression); // 正则表达式 
Matcher m = p.matcher(str); // 操作的字符串 
boolean b = m.matches(); //返回是否匹配的结果 
System.out.println(b);

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.matches();//返回false,因为bb不能被\d+匹配,导致整个字符串匹配未成功. 
Matcher m2=p.matcher("2223"); 
m2.matches();//返回true,因为\d+匹配到了整个字符串

20、Matcher appendReplacement(StringBuffer sb, String replacement)

将当前匹配子串替换为指定字符串，并将从上次匹配结束后到本次匹配结束后之间的字符串添加到一个StringBuffer对象中，最后返回其字符串表示形式。

21、StringBuffer appendTail(StringBuffer sb)

将最后一次匹配工作后剩余的字符串添加到一个StringBuffer对象里。

22、String replaceAll(String replacement)

将匹配的子串用指定的字符串替换。

（1）实例

//文字替换（全部） 
Pattern pattern = Pattern.compile("正则表达式"); 
Matcher matcher = pattern.matcher("正则表达式 Hello World,正则表达式 Hello World"); 
//替换所有符合正则的数据 
System.out.println(matcher.replaceAll("Java"));

//去除html标记 
Pattern pattern = Pattern.compile("<.+?>", Pattern.DOTALL); 
Matcher matcher = pattern.matcher("<a href=\"index.html\">主页</a>"); 
String string = matcher.replaceAll(""); 
System.out.println(string);

//替换指定{}中文字 
String str = "Java目前的发展史是由{0}年-{1}年";
String[][] object = {
    new String[] {
        "\\{0\\}",
        "1995"
    },
    new String[] {
        "\\{1\\}",
        "2007"
    }
};
System.out.println(replace(str, object));
public static String replace(final String sourceString, Object[] object) {
    String temp = sourceString;
    for (int i = 0; i < object.length; i++) {
        String[] result = (String[]) object[i];
        Pattern pattern = Pattern.compile(result[0]);
        Matcher matcher = pattern.matcher(temp);
        temp = matcher.replaceAll(result[1]);
    }
    return temp;
}

23、String replaceFirst(String replacement)

将匹配的第一个子串用指定的字符串替换。

（1）实例

//文字替换（首次出现字符） 
Pattern pattern = Pattern.compile("正则表达式"); 
Matcher matcher = pattern.matcher("正则表达式 Hello World,正则表达式 Hello World"); 
//替换第一个符合正则的数据 
System.out.println(matcher.replaceFirst("Java"));

24、Matcher usePattern(Pattern newPattern)

更改匹配器的匹配模式。

（1）实例

public static void main(String[] args) {
    Pattern p = Pattern.compile("[a-z]+");
    Matcher m = p.matcher("111aaa222");
    System.out.println(piPei(m)); // （模式[a-z]+）：匹配子串:aaa;开始位置:3;结束位置:6;
    m.usePattern(Pattern.compile("\\d+"));
    System.out.println(piPei(m)); // （模式\d+）：匹配子串:222;开始位置:6;结束位置:9;
}

public static String piPei(Matcher m) {
    StringBuffer s = new StringBuffer();
    while (m.find()) {
        s.append("匹配子串:" + m.group() + ";");
        s.append("开始位置:" + m.start() + ";");
        s.append("结束位置:" + m.end() + ";");
    }
    if (s.length() == 0) {
        s.append("没有匹配到！");
    }
    s.insert(0, "（模式" + m.pattern().pattern() + "）：");
    return s.toString();
}

Pattern p = Pattern.compile("[a-z]+");
Matcher m = p.matcher("111aaa222");
System.out.println(piPei(m));// （模式[a-z]+）：匹配子串:aaa;开始位置:3;结束位置:6;
m.usePattern(Pattern.compile("\\d+"));
m.reset();
System.out.println(piPei(m));// （模式\d+）：匹配子串:111;开始位置:0;结束位置:3;匹配子串:222;开始位置:6;结束位置:9;

public class MatcherTest {
    public static void main(String[] args) throws Exception {
        //生成Pattern对象并且编译一个简单的正则表达式"Kelvin" 
        Pattern p = Pattern.compile("Kevin");
        //用Pattern类的matcher()方法生成一个Matcher对象 
        Matcher m = p.matcher("Kelvin Li and Kelvin Chan are both working in Kelvin Chen's KelvinSoftShop company");
        StringBuffer sb = new StringBuffer();
        int i = 0;
        //使用find()方法查找第一个匹配的对象 
        boolean result = m.find();
        //使用循环将句子里所有的kelvin找出并替换再将内容加到sb里 
        while (result) {
            i++;
            m.appendReplacement(sb, "Kevin");
            System.out.println("第" + i + "次匹配后sb的内容是：" + sb);
            //继续查找下一个匹配对象 
            result = m.find();
        }
        //最后调用appendTail()方法将最后一次匹配后的剩余字符串加到sb里； 
        m.appendTail(sb);
        System.out.println("调用m.appendTail(sb)后sb的最终内容是:" + sb.toString());
    }
}

/*
输出结果：

第1次匹配后sb的内容是：Kevin 
第2次匹配后sb的内容是：Kevin Li and Kevin 
第3次匹配后sb的内容是：Kevin Li and Kevin Chan are both working in Kevin 
第4次匹配后sb的内容是：Kevin Li and Kevin Chan are both working in Kevin Chen's Kevin 
调用m.appendTail(sb)后sb的最终内容是：Kevin Li and Kevin Chan are both working in Kevin Chen's KevinSoftShop company. 
*/

25、实例

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("22bb23"); 
m.find();//返回true 
Matcher m2=p.matcher("aa2223"); 
m2.find();//返回true 
Matcher m3=p.matcher("aa2223bb"); 
m3.find();//返回true 
Matcher m4=p.matcher("aabb"); 
m4.find();//返回false

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("我的QQ是:456456 我的电话是:0532214 我的邮箱是:aaa123@aaa.com"); 
while(m.find()) { 
     System.out.println(m.group()); 
} 
/*
执行结果：
456456 
0532214 
123 
*/

Pattern p=Pattern.compile("([a-z]+)(\\d+)"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find();   //匹配aaa2223 
m.groupCount();   //返回2,因为有2组 
m.start(1);   //返回0 返回第一组匹配到的子字符串在字符串中的索引号 
m.start(2);   //返回3 
m.end(1);   //返回3 返回第一组匹配到的子字符串的最后一个字符在字符串中的索引位置. 
m.end(2);   //返回7 
m.group(1);   //返回aaa,返回第一组匹配到的子字符串 
m.group(2);   //返回2223,返回第二组匹配到的子字符串

Pattern p=Pattern.compile("\\d+"); 
Matcher m=p.matcher("aaa2223bb"); 
m.find();//匹配2223 
m.start();//返回3 
m.end();//返回7,返回的是2223后的索引号 
m.group();//返回2223 

Mathcer m2=m.matcher("2223bb"); 
m.lookingAt();   //匹配2223 
m.start();   //返回0,由于lookingAt()只能匹配前面的字符串,所以当使用lookingAt()匹配时,start()方法总是返回0 
m.end();   //返回4 
m.group();   //返回2223 

Matcher m3=m.matcher("2223bb"); 
m.matches();   //匹配整个字符串 
m.start();   //返回0,原因相信大家也清楚了 
m.end();   //返回6,原因相信大家也清楚了,因为matches()需要匹配所有字符串 
m.group();   //返回2223bb

26、Email校验

Matcher检验一个输入的EMAIL地址里所包含的字符是否合法。

//验证是否为邮箱地址
String str="ceponline@yahoo.com.cn";
Pattern pattern = Pattern.compile("[//w//.//-]+@([//w//-]+//.)+[//w//-]+",Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.matches());

import java.util.regex. * ;
public class Email {
    public static void main(String[] args) throws Exception {
        String input = args[0];
        //检测输入的EMAIL地址是否以 非法符号"."或"@"作为起始字符  
        Pattern p = Pattern.compile("^.|^@");
        Matcher m = p.matcher(input);
        //检测是否以"www."为起始  
        p = Pattern.compile("^www."); m = p.matcher(input);
        //检测是否包含非法字符  
        p = Pattern.compile("[^A-Za-z0-9.@_-~#]+"); 
        m = p.matcher(input); 
        StringBuffer sb = new StringBuffer(); 
        boolean result = m.find(); 
        boolean deletedIllegalChars = false;
        while (result) {
            //如果找到了非法字符那么就设下标记  
            deletedIllegalChars = true;
            //如果里面包含非法字符如冒号双引号等，那么就把他们消去，加到SB里面  
            m.appendReplacement(sb, "");
            result = m.find();
        }
        m.appendTail(sb); input = sb.toString();
        if (deletedIllegalChars) {
            System.out.println("输入的EMAIL地址里包含有冒号、逗号等非法字符，请修改");
            System.out.println("您现在的输入为: " + args[0]);
            System.out.println("修改后合法的地址应类似: " + input);
        }
    }
}
/*
程序运行结果：

我们在命令行输入：java Email www.kevin@163.net  
那么输出结果将会是：EMAIL地址不能以'www.'起始  
如果输入的EMAIL为@kevin@163.net  
则输出为：EMAIL地址不能以'.'或'@'作为起始字符  
当输入为：cgjmail#$%@163.net  
那么输出就是：  
输入的EMAIL地址里包含有冒号、逗号等非法字符，请修改  
您现在的输入为: cgjmail#$%@163.net  
修改后合法的地址应类似: cgjmail@163.net
*/

参考资料

http://www.51gjie.com/java/758.html

Java正则表达式及Pattern与Matcher使用详解

一、正则表达式详解

1、符号定义

（1）基本书写符号

（2）限定符

（3）匹配字符集

（4）分组构造

（5）字符转义

2、常用正则表达式举例

3、Java中RegularExpressionValidator用正则表达式校验

4、正则表达式匹配简单语法汇总

二、Pattern类详解

1、获取Pattern实例

（1）实例

2、组和捕获

3、int flags()方法

4、String pattern() 方法

5、String[] split(CharSequence input)方法

6、String[] split(CharSequence input, int limit)方法

（1）实例

7、Pattern.matches(String regex,CharSequence input)方法

8、Matcher matcher(CharSequence input)方法

（1）实例

三、Matcher类详解

1、获取Matcher实例

（1）实例

2、String toString()方法

3、Matcher reset()方法

4、Matcher reset(CharSequence input)方法

5、Pattern pattern()方法

6、int groupCount()方法

7、String group()

8、String group(int group)

9、int start()

10、int start(int group)

11、int end()

12、int end(int group)

13、boolean find()

（1）实例

14、boolean find(int start)

15、int regionStart()

16、int regionEnd()

17、Matcher region(int start,int end)

18、boolean lookingAt()

（1）实例

19、boolean matches()

（1）实例

20、Matcher appendReplacement(StringBuffer sb, String replacement)

21、StringBuffer appendTail(StringBuffer sb)

22、String replaceAll(String replacement)

（1）实例

23、String replaceFirst(String replacement)

（1）实例

24、Matcher usePattern(Pattern newPattern)

（1）实例

25、实例

26、Email校验

参考资料

相关阅读

相关文章

相关问答

相关文档