当前位置: 首页 > 工具软件 > Perl Express > 使用案例 >

SAS--Perl Regular Expressi…

袁良弼
2023-12-01

正则表达式基础

正则表达式由一些普通字符和一些元字符(metacharacters)组成。普通字符包括大小写的字母和数字,而元字符则具有特殊的含义(详细内容查help)。

一个正则表达式,就是用某种模式去匹配一类字符串的一个公式。

很多人因为它们看上去比较古怪而且复杂所以不敢去使用,这些复杂的表达式其实写起来还是相当简单的,而且,一旦你弄懂它们,你就能把数小时辛苦而且易错的文本处理工作压缩在几分钟(甚至几秒钟)内完成。

 

1、PRXMATCH (regular-expression_r_r_r-id | perl-regular-expression_r_r_r, source)

data _null_;

   position=prxmatch('/world/', 'Hello world!');

   put position=;

run;

 

2、PRXCHANGE(perl-regular-expression_r_r_r | regular-expression_r_r_r-id, times, source)

data _NULL_;

    x="fejiwof'wefji'f''fe";

    y=prxchange("s/'/M/",-1,x);   

run;

 

3、data _null_;

    text='aaaa111 bbb222ccc333 444dd55';

    y=prxchange('s/(d)([a-z])|([a-z])(d)/$1$3*$2$4/',-1,text);

    put y;                       

run;

Results:    aaaa*111 bbb*222*ccc*333 444*dd*55

 

4.

Remove spaces in the add field that separate a single alphabetic character and a string of numerical digits (1 or many)

 

 c 32 ->c32

add=prxchange("s/(b[A-Za-z])s(d+b)/$1$2/",-1,add)

 

数字与字母间插入空格:

bbb222ccc333  ->bbb 222 ccc 333 

 

addr=prxchange('s/(d)([A-Za-z])|([A-Za-z])(d)/$1$3 $2$4/',-1,add)

 

 

 具体用法 SAS HELP

[a-z]

specifies a range of characters that matches any character in the range:

  • "[a-z]" matches any lowercase alphabetic character in the range "a" through "z"

 

[^a-z]

specifies a range of characters that does not match any character in the range:

  • "[^a-z]" matches any character that is not in the range "a" through "z"

b

matches a word boundary (the position between a word and a space):

  • "erb" matches the "er" in "never"
  • "erb" does not match the "er" in "verb"

B

matches a non-word boundary:

  • "erB" matches the "er" in "verb"
  • "erB" does not match the "er" in "never"

d

matches a digit character that is equivalent to [0-9].

D

matches a non-digit character that is equivalent to [^0-9].

s

matches any white space character including space, tab, form feed, and so on, and is equivalent to [fnrtv].

S

matches any character that is not a white space character and is equivalent to [^fnrtv].

t

matches a tab character and is equivalent to "x09".

w

matches any word character including the underscore and is equivalent to [A-Za-z0-9_].

W

matches any non-word character and is equivalent to [^A-Za-z0-9_].

 

 

 类似资料: