当前位置: 首页 > 知识库问答 >
问题:

Python正则表达式查找双引号外的嵌套括号

岳阳文
2023-03-14

我有一个输入字符串,里面有括号,外面有双引号。这些括号可以嵌套。我想去掉只有在双引号外才有括号的字符串。

我尝试了这个正则表达式r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)'这将获取包含在圆括号内的所有内容,无论双引号内外。

    import re
    input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))'''
    result = re.sub(r'\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)','', input_string)
    print result

我得到的实际输出是:

'"Hello World "  anything outside round brackets should remain as is'

我预计产出为:

'"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is'

共有2个答案

墨寂弦
2023-03-14

使用regex而不是re,您可以使用

"[^"]+"(*SKIP)(*FAIL) # ignore anything between double quotes
|                     # or
\(
    (?:[^()]*|(?R))+  # match nested parentheses
\)

查看regex101上的演示。通用域名格式。

import regex as re

data = """"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes))"""

rx = re.compile(r'''
    "[^"]+"(*SKIP)(*FAIL)
    |
    \(
        (?:[^()]*|(?R))+
    \)''', re.VERBOSE)

data = rx.sub("", data)
print(data)

顺从的

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is
公良运锋
2023-03-14

如果你的括号是平衡的(借助这个答案):

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this (String this)'''

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), input_string)

print(s)

印刷品:

"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is Also remain this 

编辑运行一些测试用例:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['Normal string (strip this)',
'"Normal string (dont strip this)"',
'"Normal string (dont strip this)" but (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this)',
'"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    return re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

印刷品:

Normal string (strip this)
Normal string 

"Normal string (dont strip this)"
"Normal string (dont strip this)"

"Normal string (dont strip this)" but (strip this)
"Normal string (dont strip this)" but 

"Normal string (dont strip this)" but (strip this) and (strip this)
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but ( but "remain this (xxx)") 

编辑:删除所有(),即使其中包含引号字符串:

import re
input_string = '''"Hello World (Don't want to strip this (also not this))"  anything outside round brackets should remain as is(strip this (strip this also as it is outside double quotes) xxx) Also remain this ((String this))'''

test_cases = ['"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"',
'"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))',
'"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") ',
]

def strip_parentheses(g):
    n = 1  # run at least once
    while n:
        g, n = re.subn(r'\([^()]*\)', '', g)  # remove non-nested/flat balanced parts
    return g

def my_strip(s):
    s = re.sub(r'".*?"|([^"]*)', lambda g: strip_parentheses(g.group(1)) if g.group(1) else g.group(), s)
    return re.sub(r'".*?"|(\(.*\))', lambda g: '' if g.group(1) else g.group(), s)

for test in test_cases:
    print(test)
    print(my_strip(test))
    print()

印刷品:

"Normal string (dont strip this)" but (strip this) and (strip this) but "dont strip (this)"
"Normal string (dont strip this)" but  and  but "dont strip (this)"

"Normal string (dont strip this)" but ((strip this) and this) and (strip (strip this))
"Normal string (dont strip this)" but  and 

"Normal string (dont strip this)" but ((strip this) but "remain this (xxx)") 
"Normal string (dont strip this)" but  
 类似资料:
  • 问题内容: 我一直试图在Java中编写一个正则表达式以删除下面括号中的所有内容,同时保留其他所有内容。 注意,括号可以嵌套,这就是为什么我的模式失败的原因 。有人能帮我吗?下面我试过了: 但这打印: d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 OO 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. OO

  • 问题内容: 我正在尝试匹配带有嵌套括号的类似数学表达式的字符串。 [‘((((1 + 0)+1)+1)’] 我希望它与所有包含的表达式匹配,例如(1 + 0),((1 + 0)+1)… 我什至不在乎它是否匹配不需要的表达式,例如(((1 + 0),我可以照顾的。 为什么它还没有这样做,我该怎么做? 问题答案: 正则表达式尝试匹配尽可能多的文本,从而消耗了所有字符串。它不会在字符串的一部分上寻找正则

  • 我想将带有嵌套大括号的原始字符串解析为多维数组。下面我添加了一些有效的示例代码。但主要问题是,我的正则表达式只捕获第一个匹配的组,而忽略了另一个发生。 非常感谢您的帮助。 代码: 原始字符串(data.txt): 代码输出: 但例外输出:

  • 正如标题所说,以下是一个输入示例: 当然,匹配的字符串将通过递归进行处理。 我希望第一个递归匹配: 之后的过程不用说。。。

  • 我正在尝试匹配这些字符串: 单引号中不能包含双引号 双引号中没有双引号 单引号内的单引号-单引号只能包含内部的文本 我想出了以下正则表达式: 但它不起作用。

  • 问题内容: 我写了一个正则表达式,用double-qoutes分割字符串: 如何将其扩展为与单双qoutes一起使用? 我试过了: 但这是行不通的 问题答案: 有两种方法: