python----贝叶斯优化调参之Hyperopt

韶浩皛

2023-12-01

    Hyperopt库为python中的模型选择和参数优化提供了算法和并行方案。机器学习常见的模型有KNN,SVM，PCA，决策树，GBDT等一系列的算法，但是在实际应用中，我们需要选取合适的模型，并对模型调参，得到一组合适的参数。尤其是在模型的调参阶段，需要花费大量的时间和精力，却又效率低下。但是我们可以换一个角度来看待这个问题，模型的选取，以及模型中需要调节的参数，可以看做是一组变量，模型的质量标准（比如正确率，AUC）等等可以看做是目标函数，这个问题就是超参数的优化的问题。我们可以使用搜索算法来解决。

def q (args) :
    x, y = args
    return x ∗∗ 2 + y ∗∗ 2
   
   1
2
3

Hyperopt提供了一个优化接口，这个接口接受一个评估函数和参数空间，能计算出参数空间内的一个点的损失函数值。用户还要指定空间内参数的分布情况。
Hyheropt四个重要的因素：指定需要最小化的函数，搜索的空间，采样的数据集(trails database)（可选），搜索的算法（可选）。
首先，定义一个目标函数,接受一个变量,计算后返回一个函数的损失值，比如要最小化函数q(x,y) = x**2 + y**2:

from hyperopt import hp
space = [hp.uniform(’x’, 0, 1), hp.normal(’y’, 0, 1)]
   
   1
2

然后，定义一个参数空间，比如x在0-1区间内取值，y是实数，所以

第三，指定搜索的算法，算法也就是hyperopt的fmin函数的algo参数的取值。当前支持的算法由随机搜索(对应是hyperopt.rand.suggest)，模拟退火(对应是hyperopt.anneal.suggest)，TPE算法。举个栗子：

from hyperopt import hp, fmin, rand, tpe, space_eval
best = fmin(q, space, algo=rand.suggest)
print space_eval(space, best)
   
   1
2
3

搜索算法本身也有内置的参数决定如何去优化目标函数，我们可以指定搜索算法的参数，比如针对TPE，指定jobs：

from functools import partial
from hyperopt import hp, fmin, tpe
algo = partial(tpe.suggest, n_startup_jobs=10)
best = fmin(q, space, algo=algo)
print space_eval(space, best)
   
   1
2
3
4
5

关于参数空间的设置，比如优化函数q，输入fmin(q,space=hp.uniform(‘a’,0,1)).hp.uniform函数的第一个参数是标签，每个超参数在参数空间内必须具有独一无二的标签。hp.uniform指定了参数的分布。其他的参数分布比如
hp.choice返回一个选项，选项可以是list或者tuple.options可以是嵌套的表达式，用于组成条件参数。
hp.pchoice(label,p_options)以一定的概率返回一个p_options的一个选项。这个选项使得函数在搜索过程中对每个选项的可能性不均匀。
hp.uniform(label,low,high)参数在low和high之间均匀分布。
hp.quniform(label,low,high,q),参数的取值是round(uniform(low,high)/q)*q，适用于那些离散的取值。
hp.loguniform(label,low,high)绘制exp(uniform(low,high)),变量的取值范围是[exp(low),exp(high)]
hp.randint(label,upper) 返回一个在[0,upper)前闭后开的区间内的随机整数。
搜索空间可以含有list和dictionary.

from hyperopt import hp
list_space = [
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1)]
tuple_space = (
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1))
dict_space = {
’a’: hp.uniform(’a’, 0, 1),
’b’: hp.loguniform(’b’, 0, 1)}
   
   1
2
3
4
5
6
7
8
9
10

使用sample函数从参数空间内采样：

from hyperopt.pyll.stochasti import sample
print sample(list_space)
# => [0.13, .235]
print sample(nested_space)
# => [[{’case’: 1, ’a’, 0.12‘}, {’case’: 2, ’b’: 2.3}],
# ’extra_literal_string’,
# 3]
   
   1
2
3
4
5
6
7

在参数空间内使用函数：

from hyperopt.pyll import scope
def foo(x):
return str(x) ∗ 3
expr_space = {
’a’: 1 + hp.uniform(’a’, 0, 1),
’b’: scope.minimum(hp.loguniform(’b’, 0, 1), 10),
’c’: scope.call(foo, args=(hp.randint(’c’, 5),)),
}
   
   1
2
3
4
5
6
7
8

—————–这是一条有点短的昏割线———————————–

在blog上发现了一段使用感知器判别鸢尾花数据的代码，使用的学习率是0.1,迭代40次得到了一个测试集上正确率为82%的结果。使用hyperopt优化参数，将正确率提升到了91%。

from sklearn import datasets
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)


from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)
ppn.fit(X_train_std, y_train)

y_pred = ppn.predict(X_test_std)
print accuracy_score(y_test, y_pred)

def percept(args):
    global X_train_std,y_train,y_test
    ppn = Perceptron(n_iter=int(args["n_iter"]),eta0=args["eta"]*0.01,random_state=0)
    ppn.fit(X_train_std, y_train)
    y_pred = ppn.predict(X_test_std)
    return -accuracy_score(y_test, y_pred)

from hyperopt import fmin,tpe,hp,partial
space = {"n_iter":hp.choice("n_iter",range(30,50)),
         "eta":hp.uniform("eta",0.05,0.5)}
algo = partial(tpe.suggest,n_startup_jobs=10)
best = fmin(percept,space,algo = algo,max_evals=100)
print best
print percept(best)
#0.822222222222
#{'n_iter': 14, 'eta': 0.12877033763511717}
#-0.911111111111
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

xgboost具有很多的参数，把xgboost的代码写成一个函数，然后传入fmin中进行参数优化，将交叉验证的auc作为优化目标。auc越大越好，由于fmin是求最小值，因此求-auc的最小值。所用的数据集是202列的数据集，第一列样本id，最后一列是label,中间200列是属性。

#coding:utf-8
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import xgboost as xgb
from random import shuffle
from xgboost.sklearn import XGBClassifier
from sklearn.cross_validation import cross_val_score
import pickle
import time
from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK

def loadFile(fileName = "E://zalei//browsetop200Pca.csv"):
    data = pd.read_csv(fileName,header=None)
    data = data.values
    return data

data = loadFile()
label = data[:,-1]
attrs = data[:,:-1]
labels = label.reshape((1,-1))
label = labels.tolist()[0]

minmaxscaler = MinMaxScaler()
attrs = minmaxscaler.fit_transform(attrs)

index = range(0,len(label))
shuffle(index)
trainIndex = index[:int(len(label)*0.7)]
print len(trainIndex)
testIndex = index[int(len(label)*0.7):]
print len(testIndex)
attr_train = attrs[trainIndex,:]
print attr_train.shape
attr_test = attrs[testIndex,:]
print attr_test.shape
label_train = labels[:,trainIndex].tolist()[0]
print len(label_train)
label_test = labels[:,testIndex].tolist()[0]
print len(label_test)
print np.mat(label_train).reshape((-1,1)).shape


def GBM(argsDict):
    max_depth = argsDict["max_depth"] + 5
    n_estimators = argsDict['n_estimators'] * 5 + 50
    learning_rate = argsDict["learning_rate"] * 0.02 + 0.05
    subsample = argsDict["subsample"] * 0.1 + 0.7
    min_child_weight = argsDict["min_child_weight"]+1
    print "max_depth:" + str(max_depth)
    print "n_estimator:" + str(n_estimators)
    print "learning_rate:" + str(learning_rate)
    print "subsample:" + str(subsample)
    print "min_child_weight:" + str(min_child_weight)
    global attr_train,label_train

    gbm = xgb.XGBClassifier(nthread=4,    #进程数
                            max_depth=max_depth,  #最大深度
                            n_estimators=n_estimators,   #树的数量
                            learning_rate=learning_rate, #学习率
                            subsample=subsample,      #采样数
                            min_child_weight=min_child_weight,   #孩子数
                            max_delta_step = 10,  #10步不降则停止
                            objective="binary:logistic")

    metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean()
    print metric
    return -metric

space = {"max_depth":hp.randint("max_depth",15),
         "n_estimators":hp.randint("n_estimators",10),  #[0,1,2,3,4,5] -> [50,]
         "learning_rate":hp.randint("learning_rate",6),  #[0,1,2,3,4,5] -> 0.05,0.06
         "subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0]
         "min_child_weight":hp.randint("min_child_weight",5), #
        }
algo = partial(tpe.suggest,n_startup_jobs=1)
best = fmin(GBM,space,algo=algo,max_evals=4)

print best
print GBM(best)

   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

hyperopt文献原文介绍:
链接：http://pan.baidu.com/s/1i5aAXKx 密码：2gtr

                                            <link href="http://csdnimg.cn/release/phoenix/production/markdown_views-993eb3f29c.css" rel="stylesheet">
                                </div>

python----贝叶斯优化调参之Hyperopt

相关阅读

相关文章

相关问答

相关文档