当前位置: 首页 > 面试题库 >

查找字符串数组的公共前缀

叶景龙
2023-03-14
问题内容

我有一个像这样的数组:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

我想找到字符串的最长公共前缀。在这种情况下,'Softball - '

我以为我会遵循这个程序

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

问题

  1. 是否有内置函数或更简单的方法?

  2. 对于我的5行数组来说可能还不错,但是如果我要做几千行数组,那么将会有很多开销,所以我必须使用起始值进行移动计算$i,例如$i=字符串的一半,如果它失败,然后$i/2直到它起作用,然后再递增$i1直到我们成功。这样我们就可以进行最少的比较以获得结果。

是否已经有解决此类问题的公式/算法?


问题答案:

我在代码中实现了@diogoriba算法,结果如下:

  • 查找前两个字符串的公共前缀,然后将其与从第3个字符串开始的所有后续字符串进行比较,如果没有发现任何共同之处,则对公共字符串进行修整,在前缀中存在更多共同点而不是不同之处的情况下会获胜。
  • 但是bumperbox的原始算法(错误修正除外)会获胜,因为字符串的前缀共同点要少于不同点。 在代码注释中有详细信息!

我实现了另一个想法:

首先检查数组中最短的字符串,并将其用于比较而不是简单地比较第一个字符串。 在代码中,这是通过自定义编写函数arrayStrLenMin()实现的。

  • 可以显着降低迭代次数,但是函数arrayStrLenMin()本身可能会(或多或少)引起迭代。
  • 仅从数组中第一个字符串的长度开始似乎很笨拙,但如果arrayStrLenMin()需要多次迭代,则可能会很有效。

以尽可能少的迭代次数获取数组中字符串的最大公共前缀(PHP)

代码+广泛测试+备注:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1; // Return value for error: Array is empty
    $errOtherType = -2;     // Return value for error: Found other type (than string in array)
    $errStrNone = -3;       // Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
        if (is_string($val)) {
            $min = strlen($val);
            $strFirstFound = $key;
            // echo("Key\tLength / Notification / Error\n");
            // echo("$key\tFound first string member at key with length: $min!\n");
            break;
        }
        else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
        foreach ($arr as $key => $val) {
            if (is_string($val)) {
                $cur = strlen($val);
                // echo("$key\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
        // else { echo("$key\tNo string!\n"); }
        }
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
        for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
            if (is_string($arr[$i])) {
                $cur = strlen($arr[$i]);
                // echo("$i\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
            // else { echo("$i\tNo string!\n"); }
        }
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
        // Grab the part from 0 up to $i
        $commonStrMax = substr($arr[0], 0, $i);
        echo("Match: $i\t$commonStrMax\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($commonStrMax != substr($str, 0, $i)) {
                    return substr($commonStrMax, 0, $i-1);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
        // Grab char $i
        $char = substr($arr[0], $i, 1);
        echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
                    return substr($arr[0], 0, $i);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
        echo("  STR: $i\t{$arr[$i]}\n");

        /// Compare the maximum common string with the next neighbour

        /*
        //// Compare by char: Method unsuitable!

        // Iterate from string end to string beginning
        for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
            echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
            // If you find the first mismatch from the end, break.
            if ($arr[$i][$ii] != $strCommonMax[$ii]) {
                $strCommonMaxLength = $ii - 1; break;
                // BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
            }
        }
        */

        //// Compare by string

        for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
            echo("MATCH: $ii\t$strCommonMax\n");
            if (substr($arr[$i],0,$ii) == $strCommonMax) {
                break;
            }
            else {
                $strCommonMax = substr($strCommonMax,0,$ii - 1);
                $strCommonMaxLength--;
            }
        }
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    0   No string!
    1   No string!
    2   No string!
    3   No string!
    4   4
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


 类似资料:
  • 有人能给我解释一件事吗?我正在尝试为我的应用程序使用OnPause()。当用户选择另一个应用程序,而我的应用程序不再可见时,我的应用程序的 onpause()会完美启动。例如,我试图在OnPause()中使用一些方法来保存一些数据。我注意到,在OnPause()中,我可以使用应用程序的一些公共变量(分配数据、检索数据等),但在oncreate中填写的公共字符串数组有问题。 例如,如果我试图在OnP

  • 我遇到了一个问题语句,要在给定的两个子字符串之间找到所有公共子字符串这样一种方式,在每种情况下都必须打印最长的子字符串。问题声明如下: 编写一个程序来查找两个给定字符串之间的公共子字符串。但不包括包含在较长公共子字符串中的子字符串。 null 在这种情况下,您不必使用字符串实用程序方法,如:contains、indexOf、StringTokenizer、split和replace。 我的算法是这

  • 问题内容: 是否有一个regexp可以找到两个字符串的最长公共前缀?而且如果一个正则表达式无法解决这个问题,那么使用正则表达式(perl,ruby,python等)中最精美的代码或oneliner就是什么。 PS:我可以通过编程轻松地做到这一点,我只是想好奇,因为在我看来这可以通过正则表达式解决。 PPS:使用正则表达式的O(n)解决方案可获得额外奖励。来吧,它应该存在! 问题答案: 如果有些字符

  • 给定两个字符串,我想识别从最长到最短的所有公共子字符串。 最后,我需要检查一个字符串与数千个字符串的固定列表。我不确定在散列出这些字符串中的所有子字符串时是否有一个明智的步骤。 先前的答复: 在这个线程中,发现了一个动态编程解决方案,它需要O(nm)时间,其中n和m是字符串的长度。我对一种更有效的方法感兴趣,它将使用后缀树。 背景: 我正在根据旋律片段创作歌曲旋律。有时,一个组合会产生一个旋律,与

  • 问题是,我试图这么做,但我检查字符串长度的方法不起作用;我能做些什么来修复它?

  • 我有一套弦。其中90%是以开头的URL。我想按字母顺序排序。 对于这个问题,有没有比普通的快速排序/基数排序更好的算法?