在Python中，如何检查字符串是否仅包含某些字符？

万俟靖

2023-03-14

问题内容：

我需要检查仅包含a..z，0..9和的字符串。（句号），没有其他字符。

我可以遍历每个字符并检查字符是a..z还是0..9或。但这会很慢。

我现在不清楚如何使用正则表达式进行操作。

这个对吗？您可以提出更简单的正则表达式还是更有效的方法？

#Valid chars . a-z 0-9 
def check(test_str):
    import re
    #http://docs.python.org/library/re.html
    #re.search returns None if no position in the string matches the pattern
    #pattern to search for any character other then . a-z 0-9
    pattern = r'[^\.a-z0-9]'
    if re.search(pattern, test_str):
        #Character other then . a-z 0-9 was found
        print 'Invalid : %r' % (test_str,)
    else:
        #No character other then . a-z 0-9 was found
        print 'Valid   : %r' % (test_str,)

check(test_str='abcde.1')
check(test_str='abcde.1#')
check(test_str='ABCDE.12')
check(test_str='_-/>"!@#12345abcde<')

'''
Output:
>>> 
Valid   : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde<"
'''

问题答案：

决赛（？）

答案，包装在函数中，带有注释的交互式会话：

>>> import re
>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
...     return not bool(search(strg))
...
>>> special_match("")
True
>>> special_match("az09.")
True
>>> special_match("az09.\n")
False
# The above test case is to catch out any attempt to use re.match()
# with a `$` instead of `\Z` -- see point (6) below.
>>> special_match("az09.#")
False
>>> special_match("az09.X")
False
>>>

注意：在此答案中还有一个比较与使用re.match（）的比较。进一步的计时表明，match（）将以更长的字符串获胜；当最终答案为True时，match（）的开销似乎比search（）大得多。这令人费解（也许这是返回MatchObject而不是None的代价），并且可能需要进行进一步的反复讨论。

==== Earlier text ====

[以前]接受的答案可以使用一些改进：

（1）Presentation看起来像是一个交互式Python会话的结果：

reg=re.compile('^[a-z0-9\.]+$')
>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
True

但是match（）不返回 True

（2）与match（）^一起使用时，模式开头的冗余，并且看起来比没有模式的相同模式稍慢^

（3）对于任何re模式，应该不加思索地自动使用原始字符串

（4）点号/句点前面的反斜杠是多余的

（5） 比OP的代码慢！

prompt>rem OP's version -- NOTE: OP used raw string!

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.43 usec per loop

prompt>rem OP's version w/o backslash

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.44 usec per loop

prompt>rem cleaned-up version of accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))"
100000 loops, best of 3: 2.07 usec per loop

prompt>rem accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))"
100000 loops, best of 3: 2.08 usec per loop

（6）会 产生错误的答案！！

>>> import re
>>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n'))
True # uh-oh
>>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n'))
False

在Python中，如何检查字符串是否仅包含某些字符？

相关阅读

相关文章

相关问答

相关工具

相关文档