当前位置: 首页 > 知识库问答 >
问题:

递归正则表达式模式

胡飞舟
2023-03-14
* [[February 1]] – ''[[Brave New World]]'', a novel by [[Aldous Huxley]], is first published.
* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.
* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China.
* [[February 9]] – [[Junnosuke Inoue]], prominent Japanese businessman, banker and former governor of the Bank of Japan is assassinated by right-wing extremist group the League of Blood in the [[League of Blood Incident]].
* [[February 11]] – [[Pope Pius XI]] meets [[Benito Mussolini]] in [[Vatican City]].

我希望有一个regex来匹配以*开头的所有行,后面跟着以**开头的任意数量的行。理想情况下,我希望将带有**的每一行放在一个组中。

以下是我希望得到的结果:

> Match 1:
>> Group 1: "* [[February 2]]"

>> Group 2: "** A general [...] Part V)."

>> Group 3: "** The [[League of Nations]] [...] and Japan."

>> Group 4: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

> Match 2: 
>> Group 1: "* [[February 4]]"

>> Group 2: "** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]]."

>> Group 3: "** Japan occupies [[Harbin]], China."

(为了缩短起见,我用了[...]。)

它实际上给了我这个:

> Match 1: "* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
>> Group 1: "[[February 2]]"

>> Group 2: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

>> Group 3: "The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

> Match 2: "* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China."
>> Group 1: "[[February 4]]"

>> Group 2: "** Japan occupies [[Harbin]], China"

>> Group 3: " Japan occupies [[Harbin]], China."

我希望我已经说得够清楚了,你能帮我解决这个问题。请不要犹豫,询问更多的细节。

共有1个答案

况庆
2023-03-14

多亏了Rawing的评论,我遇到了这个解决方案:

首先,我使用以下模式:/(*any)^\*{1}(.*)\n(^\*{2}(.*?)\n)+/gm来匹配每个文本块,如下所示:

* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.

然后使用此模式获取以*开头的行:/^\*{1}(.*)/g。我还使用此模式获取以**开头的每一行:/^\*{2}(.*)$/gm

 类似资料:
  • 问题内容: 这与正则表达式匹配外括号非常相关,但是,我特别想知道该 正则表达式的递归模式 如何或是否可行? 我尚未找到使用此策略的python示例,因此认为这应该是一个有用的问题! 我已经看到 了一些 索赔 是递归的模式可以用来匹配平衡括号,但使用Python的没有例子正则表达式包(注:重 不 支持递归模式,你需要使用正则表达式)。 一种说法是语法位于: 是开始构造的东西,是可能在构造中间发生的东

  • 我有一个这样的字符串: 我需要处理这样上面的代码就变成了 我需要一直这样做直到我 我的模式字符串将匹配整个内容。不是。 Java代码:

  • 问题内容: 该字符串可以类似于以下之一: 我想匹配不限数量的“ a(x,y)”。如何使用Regex做到这一点?这是我所拥有的: 它仅匹配“ a(x,y)”的两个递归。 问题答案: Java的标准正则表达式库不支持递归,因此您无法将此类通用嵌套结构与之匹配。 但是在确实支持递归的版本(Perl,PCRE,.NET等)中,您可以使用以下表达式:

  • 字符串可以类似于以下内容之一: 我想匹配无限数量的“a(x,y)”。我如何使用正则表达式来实现这一点?以下是我所拥有的: 它只匹配"a(x, y)"的两个递归。

  • 问题内容: 我需要这件事的帮助。查看以下正则表达式: 我想查找这样的词:“自制”,“ aaaa-bbb”而不是“ aaa-bbb”,而 不是 “ aaa–aa–aaa”。基本上,我想要以下内容: 单词-连字符-单词。 它适用于所有内容,但该模式会通过:“ aaa–aaa–aaa”,但不应通过。哪种正则表达式适用于此模式? 问题答案: 可以从表达式中删除反斜杠: 下面的代码应该工作 请注意,您可以使

  • 我需要编写一个具有以下规则的正则表达式: null null 这些示例无效: > 12--11(因为它包含两个连字符) 1-2345(因为它包含5号) <>是字符出现在最后一个位置,那么在字符之前必须有一个数字not hypen。 即11-A(必须不及格)11-1A(必须及格)