3 字符类
如果你曾看过 Pattern 类的说明,会看到一些构建正则表达式的概述。在这一节中你会发现下面的一些表达式:
[abc] | a, b 或 c(简单类) |
[^abc] | 除 a, b 或 c 之外的任意字符(取反) |
[a-zA-Z] | a 到 z,或 A 到 Z,包括(范围) |
[a-d[m-p]] | a 到 d,或 m 到 p:[a-dm-p] (并集) |
[a-z&&[def]] | d,e 或 f(交集) |
[a-z&&[^bc]] | 除 b 和 c 之外的 a 到 z 字符:[ad-z] (差集) |
[a-z&&[^m-p]] | a 到 z,并且不包括 m 到 p:[a-lq-z] (差集) |
左边列指定正则表达式构造,右边列描述每个构造的匹配的条件。
注意:“字符类(character class)”这个词中的“类(class)”指的并不是一个 .class 文件。在正则表达式的语义中,字符类是放在方括号里的字符集,指定了一些字符中的一个能被给定的字符串所匹配。
3.1 简单类(Simple Classes)
字符类最基本的格式是把一些字符放在一对方括号内。例如:正则表达式[bcr]at
会匹配“bat”、“cat”或者“rat”,这是由于其定义了一个字符类(接受“b”、“c”或“r”中的一个字符)作为它的首字符。
Enter your regex: [bcr]at Enter input string to search: bat I found the text "bat" starting at index 0 and ending at index 3. Enter your regex: [bcr]at Enter input string to search: cat I found the text "cat" starting at index 0 and ending at index 3. Enter your regex: [bcr]at Enter input string to search: rat I found the text "rat" starting at index 0 and ending at index 3. Enter your regex: [bcr]at Enter input string to search: hat No match found.
在上面的例子中,在第一个字符匹配字符类中所定义字符中的一个时,整个匹配就是成功的。
3.1.1 否定
要匹配除那些列表之外所有的字符时,可以在字符类的开始处加上^
元字符,这种就被称为否定(negation)。
Enter your regex: [^bcr]at Enter input string to search: bat No match found. Enter your regex: [^bcr]at Enter input string to search: cat No match found. Enter your regex: [^bcr]at Enter input string to search: rat No match found. Enter your regex: [^bcr]at Enter input string to search: hat I found the text "hat" starting at index 0 and ending at index 3.
在输入的字符串中的第一个字符不包含在字符类中所定义字符中的一个时,匹配是成功的。
3.1.2 范围
有时会想要定义一个包含值范围的字符类,诸如,“a 到 h”的字母或者是“1 到 5”的数字。指定一个范围,只要在被匹配的首字符和末字符间插入-
元字符,比如:[1-5]
或者是[a-h]
。也可以在类里每个的边上放置不同的范围来提高匹配的可能性,例如:[a-zA-Z]
将会匹配 a 到 z(小写字母)或者 A 到 Z(大写字母)中的任何一个字符。
下面是一些范围和否定的例子:
Enter your regex: [a-c] Enter input string to search: a I found the text "a" starting at index 0 and ending at index 1. Enter your regex: [a-c] Enter input string to search: b I found the text "b" starting at index 0 and ending at index 1. Enter your regex: [a-c] Enter input string to search: c I found the text "c" starting at index 0 and ending at index 1. Enter your regex: [a-c] Enter input string to search: d No match found. Enter your regex: foo[1-5] Enter input string to search: foo1 I found the text "foo1" starting at index 0 and ending at index 4. Enter your regex: foo[1-5] Enter input string to search: foo5 I found the text "foo5" starting at index 0 and ending at index 4. Enter your regex: foo[1-5] Enter input string to search: foo6 No match found. Enter your regex: foo[^1-5] Enter input string to search: foo1 No match found. Enter your regex: foo[^1-5] Enter input string to search: foo6 I found the text "foo6" starting at index 0 and ending at index 4.
3.1.3 并集
可以使用并集(union)来建一个由两个或两个以上字符类所组成的单字符类。构建一个并集,只要在一个字符类的边上嵌套另外一个,比如:[0-4[6-8]]
,这种奇特方式构建的并集字符类,可以匹配 0,1,2,3,4,6,7,8 这几个数字。
Enter your regex: [0-4[6-8]] Enter input string to search: 0 I found the text "0" starting at index 0 and ending at index 1. Enter your regex: [0-4[6-8]] Enter input string to search: 5 No match found. Enter your regex: [0-4[6-8]] Enter input string to search: 6 I found the text "6" starting at index 0 and ending at index 1. Enter your regex: [0-4[6-8]] Enter input string to search: 8 I found the text "8" starting at index 0 and ending at index 1. Enter your regex: [0-4[6-8]] Enter input string to search: 9 No match found.
3.1.4 交集
建一个仅仅匹配自身嵌套类中公共部分字符的字符类时,可以像[0-9&&[345]]
中那样使用&&
。这种方式构建出来的交集(intersection)简单字符类,仅仅以匹配两个字符类中的 3,4,5 共有部分。
Enter your regex: [0-9&&[345]] Enter input string to search: 3 I found the text "3" starting at index 0 and ending at index 1. Enter your regex: [0-9&&[345]] Enter input string to search: 4 I found the text "4" starting at index 0 and ending at index 1. Enter your regex: [0-9&&[345]] Enter input string to search: 5 I found the text "5" starting at index 0 and ending at index 1. Enter your regex: [0-9&&[345]] Enter input string to search: 2 No match found. Enter your regex: [0-9&&[345]] Enter input string to search: 6 No match found.
下面演示两个范围交集的例子:
Enter your regex: [2-8&&[4-6]] Enter input string to search: 3 No match found. Enter your regex: [2-8&&[4-6]] Enter input string to search: 4 I found the text "4" starting at index 0 and ending at index 1. Enter your regex: [2-8&&[4-6]] Enter input string to search: 5 I found the text "5" starting at index 0 and ending at index 1. Enter your regex: [2-8&&[4-6]] Enter input string to search: 6 I found the text "6" starting at index 0 and ending at index 1. Enter your regex: [2-8&&[4-6]] Enter input string to search: 7 No match found.
3.1.5 差集
最后,可以使用差集(subtraction)来否定一个或多个嵌套的字符类,比如:[0-9&&[^345]]
,这个是构建一个匹配除 3,4,5 之外所有 0 到 9 间数字的简单字符类。
Enter your regex: [0-9&&[^345]] Enter input string to search: 2 I found the text "2" starting at index 0 and ending at index 1. Enter your regex: [0-9&&[^345]] Enter input string to search: 3 No match found. Enter your regex: [0-9&&[^345]] Enter input string to search: 4 No match found. Enter your regex: [0-9&&[^345]] Enter input string to search: 5 No match found. Enter your regex: [0-9&&[^345]] Enter input string to search: 6 I found the text "6" starting at index 0 and ending at index 1. Enter your regex: [0-9&&[^345]] Enter input string to search: 9 I found the text "9" starting at index 0 and ending at index 1.
到此为止,已经涵盖了如何建立字符类的部分。在继续下一节之前,可以试着回想一下那张字符类表。