特征选择 Feature Selection - Ex 5: Test with permutations the significance of a classification score


特征选择/范例五: Test with permutations the significance of a classification score



Ex 5: Test with permutations the significance of a classification score - 图1

  1. 资料集:鸢尾花
  2. 特征:萼片(sepal)之长与宽以及花瓣(petal)之长与宽
  3. 预测目标:共有三种鸢尾花 setosa, versicolor, virginica
  4. 机器学习方法:线性分类
  5. 探讨重点:变换训练资料分类的目标标签,减少标签数值对分类的影响
  6. 关键函式: sklearn.cross_validation.permutation_test_score

【1】Ojala and Garriga. Permutation Tests for Studying Classifier Performance. The Journal of Machine Learning Research (2010) vol. 11



  1. # Loading a dataset
  2. iris = datasets.load_iris()
  3. X = iris.data
  4. y = iris.target
  5. n_classes = np.unique(y).size
  6. # Some noisy data not correlated
  7. random = np.random.RandomState(seed=0)
  8. E = random.normal(size=(len(X), 2200))
  9. # Add noisy data to the informative features for make the task harder
  10. X = np.c_[X, E]



  1. svm = SVC(kernel='linear')
  2. cv = StratifiedKFold(y, 2)



  1. score, permutation_scores, pvalue = permutation_test_score(
  2. svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)
  3. print("Classification score %s (pvalue : %s)" % (score, pvalue))




  1. ###############################################################################
  2. # View histogram of permutation scores
  3. plt.hist(permutation_scores, 20, label='Permutation scores')
  4. ylim = plt.ylim()
  5. # BUG: vlines(..., linestyle='--') fails on older versions of matplotlib
  6. #plt.vlines(score, ylim[0], ylim[1], linestyle='--',
  7. # color='g', linewidth=3, label='Classification Score'
  8. # ' (pvalue %s)' % pvalue)
  9. #plt.vlines(1.0 / n_classes, ylim[0], ylim[1], linestyle='--',
  10. # color='k', linewidth=3, label='Luck')
  11. plt.plot(2 * [score], ylim, '--g', linewidth=3,
  12. label='Classification Score'
  13. ' (pvalue %s)' % pvalue)
  14. plt.plot(2 * [1. / n_classes], ylim, '--k', linewidth=3, label='Luck')
  15. plt.ylim(ylim)
  16. plt.legend()
  17. plt.xlabel('Score')
  18. plt.show()

Ex 5: Test with permutations the significance of a classification score - 图2


Python source code: plot_select_from_model_boston.py

  1. # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>
  2. # License: BSD 3 clause
  3. print(__doc__)
  4. import numpy as np
  5. import matplotlib.pyplot as plt
  6. from sklearn.svm import SVC
  7. from sklearn.cross_validation import StratifiedKFold, permutation_test_score
  8. from sklearn import datasets
  9. ##############################################################################
  10. # Loading a dataset
  11. iris = datasets.load_iris()
  12. X = iris.data
  13. y = iris.target
  14. n_classes = np.unique(y).size
  15. # Some noisy data not correlated
  16. random = np.random.RandomState(seed=0)
  17. E = random.normal(size=(len(X), 2200))
  18. # Add noisy data to the informative features for make the task harder
  19. X = np.c_[X, E]
  20. svm = SVC(kernel='linear')
  21. cv = StratifiedKFold(y, 2)
  22. score, permutation_scores, pvalue = permutation_test_score(
  23. svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)
  24. print("Classification score %s (pvalue : %s)" % (score, pvalue))
  25. ###############################################################################
  26. # View histogram of permutation scores
  27. plt.hist(permutation_scores, 20, label='Permutation scores')
  28. ylim = plt.ylim()
  29. # BUG: vlines(..., linestyle='--') fails on older versions of matplotlib
  30. #plt.vlines(score, ylim[0], ylim[1], linestyle='--',
  31. # color='g', linewidth=3, label='Classification Score'
  32. # ' (pvalue %s)' % pvalue)
  33. #plt.vlines(1.0 / n_classes, ylim[0], ylim[1], linestyle='--',
  34. # color='k', linewidth=3, label='Luck')
  35. plt.plot(2 * [score], ylim, '--g', linewidth=3,
  36. label='Classification Score'
  37. ' (pvalue %s)' % pvalue)
  38. plt.plot(2 * [1. / n_classes], ylim, '--k', linewidth=3, label='Luck')
  39. plt.ylim(ylim)
  40. plt.legend()
  41. plt.xlabel('Score')
  42. plt.show()