正则表达式基础
正则表达式由一些普通字符和一些元字符(metacharacters)组成。普通字符包括大小写的字母和数字,而元字符则具有特殊的含义(详细内容查help)。
一个正则表达式,就是用某种模式去匹配一类字符串的一个公式。
很多人因为它们看上去比较古怪而且复杂所以不敢去使用,这些复杂的表达式其实写起来还是相当简单的,而且,一旦你弄懂它们,你就能把数小时辛苦而且易错的文本处理工作压缩在几分钟(甚至几秒钟)内完成。
1、PRXMATCH (regular-expression_r_r_r-id | perl-regular-expression_r_r_r, source)
data _null_;
run;
2、PRXCHANGE(perl-regular-expression_r_r_r | regular-expression_r_r_r-id, times, source)
data _NULL_;
run;
3、data _null_;
run;
Results:
4.
Remove spaces in the add field that separate a single alphabetic character and a string of numerical digits (1 or many)
add=prxchange("s/(b[A-Za-z])s(d+b)/$1$2/",-1,add)
数字与字母间插入空格:
bbb222ccc333
addr=prxchange('s/(d)([A-Za-z])|([A-Za-z])(d)/$1$3 $2$4/',-1,add)
[a-z] | specifies a range of characters that matches any character in the range:
|
[^a-z] | specifies a range of characters that does not match any character in the range:
|
b | matches a word boundary (the position between a word and a space):
|
B | matches a non-word boundary:
|
d | matches a digit character that is equivalent to [0-9]. |
D | matches a non-digit character that is equivalent to [^0-9]. |
s | matches any white space character including space, tab, form feed, and so on, and is equivalent to [fnrtv]. |
S | matches any character that is not a white space character and is equivalent to [^fnrtv]. |
t | matches a tab character and is equivalent to "x09". |
w | matches any word character including the underscore and is equivalent to [A-Za-z0-9_]. |
W | matches any non-word character and is equivalent to [^A-Za-z0-9_]. |