SAS--Perl Regular Expressi…

袁良弼

2023-12-01

正则表达式基础

正则表达式由一些普通字符和一些元字符（metacharacters）组成。普通字符包括大小写的字母和数字，而元字符则具有特殊的含义（详细内容查help）。

一个正则表达式，就是用某种模式去匹配一类字符串的一个公式。

很多人因为它们看上去比较古怪而且复杂所以不敢去使用，这些复杂的表达式其实写起来还是相当简单的，而且，一旦你弄懂它们，你就能把数小时辛苦而且易错的文本处理工作压缩在几分钟（甚至几秒钟）内完成。

1、PRXMATCH (regular-expression_r_r_r-id | perl-regular-expression_r_r_r, source)

data _null_;

position=prxmatch('/world/', 'Hello world!');

put position=;

run;

2、PRXCHANGE(perl-regular-expression_r_r_r | regular-expression_r_r_r-id, times, source)

data _NULL_;

x="fejiwof'wefji'f''fe";

y=prxchange("s/'/M/",-1,x);

run;

3、data _null_;

text='aaaa111 bbb222ccc333 444dd55';

y=prxchange('s/(d)([a-z])|([a-z])(d)/$1$3*$2$4/',-1,text);

put y;

run;

Results: aaaa*111 bbb*222*ccc*333 444*dd*55

Remove spaces in the add field that separate a single alphabetic character and a string of numerical digits (1 or many)

c 32 ->c32

add=prxchange("s/(b[A-Za-z])s(d+b)/$1$2/",-1,add)

数字与字母间插入空格：

bbb222ccc333 ->bbb 222 ccc 333

addr=prxchange('s/(d)([A-Za-z])|([A-Za-z])(d)/$1$3 $2$4/',-1,add)

具体用法 SAS HELP

[a-z]	specifies a range of characters that matches any character in the range: "[a-z]" matches any lowercase alphabetic character in the range "a" through "z"
[^a-z]	specifies a range of characters that does not match any character in the range: "[^a-z]" matches any character that is not in the range "a" through "z"
b	matches a word boundary (the position between a word and a space): "erb" matches the "er" in "never" "erb" does not match the "er" in "verb"
B	matches a non-word boundary: "erB" matches the "er" in "verb" "erB" does not match the "er" in "never"
d	matches a digit character that is equivalent to [0-9].
D	matches a non-digit character that is equivalent to [^0-9].
s	matches any white space character including space, tab, form feed, and so on, and is equivalent to [fnrtv].
S	matches any character that is not a white space character and is equivalent to [^fnrtv].
t	matches a tab character and is equivalent to "x09".
w	matches any word character including the underscore and is equivalent to [A-Za-z0-9_].
W	matches any non-word character and is equivalent to [^A-Za-z0-9_].

SAS--Perl&nbsp;Regular&nbsp;Expressi…