4 预定义字符类

优质

小牛编辑

152浏览

2023-12-01

Pattern 的 API 包有许多有用的预定义字符类（predefined character classes），提供了常用正则表达式的简写形式。

预定义字符类
`.`	任何字符（匹配或者不匹配行结束符）
`\d`	数字字符：`[0-9]`
`\D`	非数字字符：`[^0-9]`
`\s`	空白字符：`[\t\n\x0B\f\r]`
`\S`	非空白字符：`[^\s]`
`\w`	单词字符：`[a-zA-Z_0-9]`
`\W`	非单词字符：`[^\w]`

上表中，左列是构造右列字符类的简写形式。例如：\d指的是数字范围（0～9），\w指的是单词字符（任何大小写字母、下划线或者是数字）。无论何时都有可能使用预定义字符类，它可以使代码更易阅读，更易从难看的字符类中排除错误。

以反斜线（\）开始的构造称为转义构造（escaped constructs）。回顾一下在字符串一节中的转义构造，在那里我们提及了使用反斜线，以及用于引用的\Q和\E。在字符串中使用转义构造，必须在一个反斜线前再增加一个反斜用于字符串的编译，例如：

private final String REGEX = "\\d";        // 单个数字

这个例子中\d是正则表达式，另外的那个反斜线是用于代码编译所必需的。但是测试用具读取的表达式，是直接从控制台中输入的，因此不需要那个多出来的反斜线。

下面的例子说明了预字义字符类的用法：

Enter your regex: .
Enter input string to search: @
I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: a
No match found.

Enter your regex: \D
Enter input string to search: 1
No match found.

Enter your regex: \D
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search:  
I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search: a
No match found.

Enter your regex: \S
Enter input string to search:  
No match found.

Enter your regex: \S
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: !
No match found.

Enter your regex: \W
Enter input string to search: a
No match found.

Enter your regex: \W
Enter input string to search: !
I found the text "!" starting at index 0 and ending at index 1.

在开始的三个例子中，正则表达式是简单的，.（“点”元字符）表示“任意字符”，因此，在所有的三个例子（随意地选取了“@”字符，数字和字母）中都是匹配成功的。在接下来的例子中，都使用了预定义字符类表格中的单个正则表达式构造。你应该可以根据这张表指出前面每个匹配的逻辑：

\d 匹配数字字符

\s 匹配空白字符

\w 匹配单词字符

也可以使用意思正好相反的大写字母：

\D 匹配非数字字符

\S 匹配非空白字符

\W 匹配非单词字符