Wilcoxon signed-rank test和Wilcoxon rank-sum test及其在SciPy中的使用注意事项

章高朗

2023-12-01

其实这个问题很多朋友都已经介绍得很清楚了，例如这里：https://blog.csdn.net/chikily_yongfeng/article/details/82255575，http://blog.sciencenet.cn/blog-306699-984510.html，https://blog.csdn.net/flyfrommath/article/details/75541607，https://blog.csdn.net/chang349276/article/details/76344979

需要补充的一点是，我们注意到，在SciPy中实现了三种检验方法，在这里列出：https://docs.scipy.org/doc/scipy/reference/stats.html

分别是：

`ranksums`(x, y)	Compute the Wilcoxon rank-sum statistic for two samples.
`wilcoxon`(x[, y, zero_method, correction])	Calculate the Wilcoxon signed-rank test.

mannwhitneyu(x, y[, use_continuity, alternative]) Compute the Mann-Whitney rank test on samples x and y.

其实按照上面博客中的介绍，在统计学中，Wilcoxon rank-sum test（威尔科克森秩和检验）也叫 Mann-Whitney U test（曼-惠特尼 U 检验），可是SciPy为什么又分成了两个函数呢？我们注意到在ranksums的说明页面中：

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html

写到：“For tie-handling and an optional continuity correction see scipy.stats.mannwhitneyu.”

这其实就很好理解了，mannwhitneyu适用于当满足：Use only when the number of observation in each sample is > 20 and you have 2 independent samples of ranks. 时数据中存在结（Ties）情况。按照网上资料的介绍，所谓结，可以这样理解：

“很多情况下，数据中会出现相同的观测值，对它们进行排序后，这些相同观测值的排名显然是并列的，也就是它们的秩是相同的，这种情况被称为数据中的结，对于结的处理，通常是将它们排序后所处位置的平均值作为它们的秩，当数据中结比较多时，某些非参数检验中原假设下检验统计量的分布就会受到影响，从而需要对统计量进行修正。”

其实看到邮件列表中也有人讨论过这个问题：

https://grokbase.com/t/scipy.org/scipy-user/12a92e4vy2/stats-ranksums-vs-stats-mannwhitneyu

关于结的介绍，可以参考这个PPT：https://wenku.baidu.com/view/af90a824e2bd960590c67783.html

补充两个材料，Wilcoxon Rank-Sum Table的链接和进一步说明：

http://www.real-statistics.com/statistics-tables/wilcoxon-rank-sum-table-independent-samples/

http://www.socr.ucla.edu/Applets.dir/WilcoxonRankSumTable.html

Wilcoxon signed-rank test和Wilcoxon rank-sum test及其在SciPy中的使用注意事项

相关阅读

相关文章

相关问答

相关文档