如何在C#中实现Jaro-Winkler距离字符串比较算法?
public static class JaroWinklerDistance
{
/* The Winkler modification will not be applied unless the
* percent match was at or above the mWeightThreshold percent
* without the modification.
* Winkler's paper used a default value of 0.7
*/
private static readonly double mWeightThreshold = 0.7;
/* Size of the prefix to be concidered by the Winkler modification.
* Winkler's paper used a default value of 4
*/
private static readonly int mNumChars = 4;
/// <summary>
/// Returns the Jaro-Winkler distance between the specified
/// strings. The distance is symmetric and will fall in the
/// range 0 (perfect match) to 1 (no match).
/// </summary>
/// <param name="aString1">First String</param>
/// <param name="aString2">Second String</param>
/// <returns></returns>
public static double distance(string aString1, string aString2) {
return 1.0 - proximity(aString1,aString2);
}
/// <summary>
/// Returns the Jaro-Winkler distance between the specified
/// strings. The distance is symmetric and will fall in the
/// range 0 (no match) to 1 (perfect match).
/// </summary>
/// <param name="aString1">First String</param>
/// <param name="aString2">Second String</param>
/// <returns></returns>
public static double proximity(string aString1, string aString2)
{
int lLen1 = aString1.Length;
int lLen2 = aString2.Length;
if (lLen1 == 0)
return lLen2 == 0 ? 1.0 : 0.0;
int lSearchRange = Math.Max(0,Math.Max(lLen1,lLen2)/2 - 1);
// default initialized to false
bool[] lMatched1 = new bool[lLen1];
bool[] lMatched2 = new bool[lLen2];
int lNumCommon = 0;
for (int i = 0; i < lLen1; ++i) {
int lStart = Math.Max(0,i-lSearchRange);
int lEnd = Math.Min(i+lSearchRange+1,lLen2);
for (int j = lStart; j < lEnd; ++j) {
if (lMatched2[j]) continue;
if (aString1[i] != aString2[j])
continue;
lMatched1[i] = true;
lMatched2[j] = true;
++lNumCommon;
break;
}
}
if (lNumCommon == 0) return 0.0;
int lNumHalfTransposed = 0;
int k = 0;
for (int i = 0; i < lLen1; ++i) {
if (!lMatched1[i]) continue;
while (!lMatched2[k]) ++k;
if (aString1[i] != aString2[k])
++lNumHalfTransposed;
++k;
}
// System.Diagnostics.Debug.WriteLine("numHalfTransposed=" + numHalfTransposed);
int lNumTransposed = lNumHalfTransposed/2;
// System.Diagnostics.Debug.WriteLine("numCommon=" + numCommon + " numTransposed=" + numTransposed);
double lNumCommonD = lNumCommon;
double lWeight = (lNumCommonD/lLen1
+ lNumCommonD/lLen2
+ (lNumCommon - lNumTransposed)/lNumCommonD)/3.0;
if (lWeight <= mWeightThreshold) return lWeight;
int lMax = Math.Min(mNumChars,Math.Min(aString1.Length,aString2.Length));
int lPos = 0;
while (lPos < lMax && aString1[lPos] == aString2[lPos])
++lPos;
if (lPos == 0) return lWeight;
return lWeight + 0.1 * lPos * (1.0 - lWeight);
}
}
问题内容: 我一直想知道如何在Transact SQL中实现此算法,https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance 怎么做到呢? 问题答案: 今天,我终于偶然发现了leebickmtu的这个Stack Overflow-answer,它显示了最初从Java移植的C#实现。我自由地将其移植到Transact SQL函数,请尽情享
我想从多个文件做数百万条记录的模糊匹配。我为此确定了两种算法:Jaro-Winkler和Levenshtein编辑距离。
例子: 我在想,如果仅仅为了比较两个字符串并检测细微的变化,两个算法都满足了这个目的,那么除非是为了提高性能,否则选择一个而不是另一个就没有附加值了?
我一直试图使用Sim-metrics库,从: 但是根据https://asecuritysite.com/forensics/simstring jaro-winkler应该为0,重叠系数应该为100。这甚至是使用这个库的正确方式吗?什么是适当的调用,比如说,如果我想运行这两个指标来匹配我从IMDB获得的一个列表到另一个列表的电影,我打算比较两组的标题,得到两组得分的平均值,并对两组电影的演员阵容
我需要计算汽车行驶的距离!不是距离,不是距离到否。如果我们通过谷歌提供的API计算,距离可以完全不同。谷歌可以提供从一个点到另一个点的1公里距离,但汽车可以按照骑手想要的方式行驶800米。使用加速计没有帮助。它适用于步行,但绝不适用于更快的速度。 我尝试过使用Google的位置API:距离到或距离之间根本不是一个选项。它可以给出与IN REAL截然不同的结果。在真实的汽车中,可以通过非常短的地方并
本文向大家介绍C ++中树的距离总和,包括了C ++中树的距离总和的使用技巧和注意事项,需要的朋友参考一下 假设我们有一棵无向的,连接的树,其中有N个节点。这些标记为0 ... N-1,给出了N-1边。第i条边将节点edge [i] [0]和edge [i] [1]连接在一起。我们必须找到一个列表,其中ans [i]是节点i与所有其他节点之间的距离之和。 因此,如果输入像N = 6并且edges