类神经网路 Neural_Networks - Ex 3: Compare Stochastic learning strategies for MLPClassifier

优质
小牛编辑
136浏览
2023-12-01

http://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_training_curves.html

此范例将画出图表,展现不同的训练策略(optimizer)下loss curves的变化,训练策略包括SGD与Adam。

1.Stochastic Gradient Descent(SGD):

.Stochastic Gradient Descent(SGD)为Gradient Descent(GD)的改良,在GD里是输入全部的training dataset,根据累积的loss才更新一次权重,因此收歛速度很慢,SGD随机抽一笔 training sample,依照其 loss 更新权重。

2.Momentum:

Momentum是为了以防GD类的方法陷入局部最小值而衍生的方法,可以利用momentum降低陷入local minimum的机率,此方法是参考物理学动量的观念。

看图1蓝色点的位置,当GD类的方法陷入局部最小值时,因为gd=0将会使电脑认为此处为最小值,于是为了减少此现象,每次更新时会将上次更新权重的一部分拿来加入此次更新。如红色箭头所示,将有机会翻过local minimum。

Ex 3: Compare Stochastic learning strategies for MLPClassifier - 图1

图1:momentum观念示意图

3.Nesterov Momentum:

Nesterov Momentum为另外一种Momentum的变形体,目的也是降低陷入local minimum机率的方法,而两种方法的差异在于下图:

Ex 3: Compare Stochastic learning strategies for MLPClassifier - 图2

图2:左图为momentum,1.先计算 gradient、2.加上 momentum、3.更新权重

右图为Nesterov Momentum,1.先加上momentum、2.计算gradient、3.更新权重。

图2图片来源:http://cs231n.github.io/neural-networks-3/

4.Adaptive Moment Estimation (Adam):

Adam为一种自己更新学习速率的方法,会根据GD计算出来的值调整每个参数的学习率(因材施教)。

以上所有的最佳化方法都将需要设定learning_rate_init值,此范例结果将呈现四种不同资料的比较:iris资料集、digits资料集、与使用sklearn.datasets产生资料集circlesmoon

(一)引入函式库

  1. print(__doc__)
  2. import matplotlib.pyplot as plt
  3. from sklearn.neural_network import MLPClassifier
  4. from sklearn.preprocessing import MinMaxScaler
  5. from sklearn import datasets

(二)设定模型参数

  1. # different learning rate schedules and momentum parameters
  2. params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,
  3. 'learning_rate_init': 0.2},
  4. {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
  5. 'nesterovs_momentum': False, 'learning_rate_init': 0.2},
  6. {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
  7. 'nesterovs_momentum': True, 'learning_rate_init': 0.2},
  8. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,
  9. 'learning_rate_init': 0.2},
  10. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
  11. 'nesterovs_momentum': True, 'learning_rate_init': 0.2},
  12. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
  13. 'nesterovs_momentum': False, 'learning_rate_init': 0.2},
  14. {'solver': 'adam', 'learning_rate_init': 0.01}]
  15. labels = ["constant learning-rate", "constant with momentum",
  16. "constant with Nesterov's momentum",
  17. "inv-scaling learning-rate", "inv-scaling with momentum",
  18. "inv-scaling with Nesterov's momentum", "adam"]
  19. plot_args = [{'c': 'red', 'linestyle': '-'},
  20. {'c': 'green', 'linestyle': '-'},
  21. {'c': 'blue', 'linestyle': '-'},
  22. {'c': 'red', 'linestyle': '--'},
  23. {'c': 'green', 'linestyle': '--'},
  24. {'c': 'blue', 'linestyle': '--'},
  25. {'c': 'black', 'linestyle': '-'}]

(三)画出loss curves

  1. def plot_on_dataset(X, y, ax, name):
  2. # for each dataset, plot learning for each learning strategy
  3. print("\nlearning on dataset %s" % name)
  4. ax.set_title(name)
  5. X = MinMaxScaler().fit_transform(X)
  6. mlps = []
  7. if name == "digits":
  8. # digits is larger but converges fairly quickly
  9. max_iter = 15
  10. else:
  11. max_iter = 400
  12. for label, param in zip(labels, params):
  13. print("training: %s" % label)
  14. mlp = MLPClassifier(verbose=0, random_state=0,
  15. max_iter=max_iter, **param)
  16. mlp.fit(X, y)
  17. mlps.append(mlp)
  18. print("Training set score: %f" % mlp.score(X, y))
  19. print("Training set loss: %f" % mlp.loss_)
  20. for mlp, label, args in zip(mlps, labels, plot_args):
  21. ax.plot(mlp.loss_curve_, label=label, **args)
  22. fig, axes = plt.subplots(2, 2, figsize=(15, 10))
  23. # load / generate some toy datasets
  24. iris = datasets.load_iris()
  25. digits = datasets.load_digits()
  26. data_sets = [(iris.data, iris.target),
  27. (digits.data, digits.target),
  28. datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
  29. datasets.make_moons(noise=0.3, random_state=0)]
  30. for ax, data, name in zip(axes.ravel(), data_sets, ['iris', 'digits',
  31. 'circles', 'moons']):
  32. plot_on_dataset(*data, ax=ax, name=name)
  33. fig.legend(ax.get_lines(), labels=labels, ncol=3, loc="upper center")
  34. plt.show()

Ex 3: Compare Stochastic learning strategies for MLPClassifier - 图3

图3:四种资料对于不同学习方法的loss curves下降比较图

(四)完整程式码

  1. print(__doc__)
  2. import matplotlib.pyplot as plt
  3. from sklearn.neural_network import MLPClassifier
  4. from sklearn.preprocessing import MinMaxScaler
  5. from sklearn import datasets
  6. # different learning rate schedules and momentum parameters
  7. params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,
  8. 'learning_rate_init': 0.2},
  9. {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
  10. 'nesterovs_momentum': False, 'learning_rate_init': 0.2},
  11. {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
  12. 'nesterovs_momentum': True, 'learning_rate_init': 0.2},
  13. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,
  14. 'learning_rate_init': 0.2},
  15. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
  16. 'nesterovs_momentum': True, 'learning_rate_init': 0.2},
  17. {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
  18. 'nesterovs_momentum': False, 'learning_rate_init': 0.2},
  19. {'solver': 'adam', 'learning_rate_init': 0.01}]
  20. labels = ["constant learning-rate", "constant with momentum",
  21. "constant with Nesterov's momentum",
  22. "inv-scaling learning-rate", "inv-scaling with momentum",
  23. "inv-scaling with Nesterov's momentum", "adam"]
  24. plot_args = [{'c': 'red', 'linestyle': '-'},
  25. {'c': 'green', 'linestyle': '-'},
  26. {'c': 'blue', 'linestyle': '-'},
  27. {'c': 'red', 'linestyle': '--'},
  28. {'c': 'green', 'linestyle': '--'},
  29. {'c': 'blue', 'linestyle': '--'},
  30. {'c': 'black', 'linestyle': '-'}]
  31. def plot_on_dataset(X, y, ax, name):
  32. # for each dataset, plot learning for each learning strategy
  33. print("\nlearning on dataset %s" % name)
  34. ax.set_title(name)
  35. X = MinMaxScaler().fit_transform(X)
  36. mlps = []
  37. if name == "digits":
  38. # digits is larger but converges fairly quickly
  39. max_iter = 15
  40. else:
  41. max_iter = 400
  42. for label, param in zip(labels, params):
  43. print("training: %s" % label)
  44. mlp = MLPClassifier(verbose=0, random_state=0,
  45. max_iter=max_iter, **param)
  46. mlp.fit(X, y)
  47. mlps.append(mlp)
  48. print("Training set score: %f" % mlp.score(X, y))
  49. print("Training set loss: %f" % mlp.loss_)
  50. for mlp, label, args in zip(mlps, labels, plot_args):
  51. ax.plot(mlp.loss_curve_, label=label, **args)
  52. fig, axes = plt.subplots(2, 2, figsize=(15, 10))
  53. # load / generate some toy datasets
  54. iris = datasets.load_iris()
  55. digits = datasets.load_digits()
  56. data_sets = [(iris.data, iris.target),
  57. (digits.data, digits.target),
  58. datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
  59. datasets.make_moons(noise=0.3, random_state=0)]
  60. for ax, data, name in zip(axes.ravel(), data_sets, ['iris', 'digits',
  61. 'circles', 'moons']):
  62. plot_on_dataset(*data, ax=ax, name=name)
  63. fig.legend(ax.get_lines(), labels=labels, ncol=3, loc="upper center")
  64. plt.show()