问题：

在PHP中为regex模式生成所有可能的匹配项

别烨熠

2023-03-14

SO上有很多关于如何解析正则表达式模式并输出该模式所有可能匹配的问题。然而，出于某种原因，我能找到的每一个问题（1, 2, 3, 4, 5, 6, 7，可能更多）要么是针对Java，要么是针对各种C（只有一个针对JavaScript），我目前需要在PHP中完成这项工作。

我已经在Google上搜索到了我最喜欢的内容，但无论我做什么，Google给我的唯一东西就是指向preg\u match（）的文档链接，以及关于如何使用regex的页面，这与我在这里想要的正好相反。

我的正则表达式模式都非常简单，保证是有限的；使用的唯一语法是：

字符类

因此，一个例子可能是hun（k | der）（s | ed | ing）匹配动词chunk、thunk、chunder和thunder的所有可能形式，共16种排列。

理想情况下，应该有一个PHP库或工具，它可以迭代（有限的）正则表达式模式，并输出所有可能的匹配项，一切都准备就绪。有人知道这样的库/工具是否已经存在吗？

如果不是，那么什么是优化的方法？JavaScript的这个答案是我所能找到的最接近于我应该能够适应的东西，但不幸的是，我无法理解它的实际工作方式，这使得适应变得更加棘手。另外，无论如何，在PHP中可能有更好的方法。对于如何最好地分解任务的一些逻辑指针，我们将不胜感激。

编辑：由于显然不清楚这在实践中会是什么样子，我正在寻找允许这种类型输入的东西：

$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');

–然后打印$possibleMatches应该会得到这样的结果（在我的情况下，元素的顺序并不重要）：

Array
(
    [0] => chunk
    [1] => thunk
    [2] => chunks
    [3] => thunks
    [4] => chunked
    [5] => thunked
    [6] => chunking
    [7] => thunking
    [8] => chunder
    [9] => thunder
    [10] => chunders
    [11] => thunders
    [12] => chundered
    [13] => thundered
    [14] => chundering
    [15] => thundering
)

林建本

2023-03-14

您需要去掉可变模式；您可以使用preg\u match\u all来执行此操作

preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches);

/* Regex:

/(\[\w+\]|\([\w|]+\))/
/                       : Pattern delimiter
 (                      : Start of capture group
  \[\w+\]               : Character class pattern
         |              : OR operator
          \([\w|]+\)    : Capture group pattern
                    )   : End of capture group
                     /  : Pattern delimiter

*/

然后可以将捕获组扩展为字母或单词（取决于类型）

$array = str_split($cleanString, 1); // For a character class
$array = explode("|", $cleanString); // For a capture group

递归地遍历每个$数组

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

在使用中，你会把它放在“preg_match_all”之前。

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

输出：

This happen(s|ed) to (become|be|have|having) test case 1?

主要是更新正则表达式：

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

并将一个else添加到preptions函数中：

} else {
    $array = [$cleanString];
}

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        } else {
            $array = [$cleanString];
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex);


preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

输出：

This happens to become test case 1
This happens to become test case 
This happens to be test case 1
This happens to be test case 
This happens to have test case 1
This happens to have test case 
This happens to having test case 1
This happens to having test case 
This happened to become test case 1
This happened to become test case 
This happened to be test case 1
This happened to be test case 
This happened to have test case 1
This happened to have test case 
This happened to having test case 1
This happened to having test case

在PHP中为regex模式生成所有可能的匹配项

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档