采用集成学习的方法对美国爱荷华州埃姆斯地区的房价进行预测
数据集(train)中有1460个样本,81个特征,目标特征为房价
import pandas as pd
train = pd.read_csv(r"C:\Users\SZS-Student\Desktop\机器学习与Python实践\train-2.csv")
train.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | ... | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
5 rows × 81 columns
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1460 non-null int64
1 MSSubClass 1460 non-null int64
2 MSZoning 1460 non-null object
3 LotFrontage 1201 non-null float64
4 LotArea 1460 non-null int64
5 Street 1460 non-null object
6 Alley 91 non-null object
7 LotShape 1460 non-null object
8 LandContour 1460 non-null object
9 Utilities 1460 non-null object
10 LotConfig 1460 non-null object
11 LandSlope 1460 non-null object
12 Neighborhood 1460 non-null object
13 Condition1 1460 non-null object
14 Condition2 1460 non-null object
15 BldgType 1460 non-null object
16 HouseStyle 1460 non-null object
17 OverallQual 1460 non-null int64
18 OverallCond 1460 non-null int64
19 YearBuilt 1460 non-null int64
20 YearRemodAdd 1460 non-null int64
21 RoofStyle 1460 non-null object
22 RoofMatl 1460 non-null object
23 Exterior1st 1460 non-null object
24 Exterior2nd 1460 non-null object
25 MasVnrType 1452 non-null object
26 MasVnrArea 1452 non-null float64
27 ExterQual 1460 non-null object
28 ExterCond 1460 non-null object
29 Foundation 1460 non-null object
30 BsmtQual 1423 non-null object
31 BsmtCond 1423 non-null object
32 BsmtExposure 1422 non-null object
33 BsmtFinType1 1423 non-null object
34 BsmtFinSF1 1460 non-null int64
35 BsmtFinType2 1422 non-null object
36 BsmtFinSF2 1460 non-null int64
37 BsmtUnfSF 1460 non-null int64
38 TotalBsmtSF 1460 non-null int64
39 Heating 1460 non-null object
40 HeatingQC 1460 non-null object
41 CentralAir 1460 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1460 non-null int64
44 2ndFlrSF 1460 non-null int64
45 LowQualFinSF 1460 non-null int64
46 GrLivArea 1460 non-null int64
47 BsmtFullBath 1460 non-null int64
48 BsmtHalfBath 1460 non-null int64
49 FullBath 1460 non-null int64
50 HalfBath 1460 non-null int64
51 BedroomAbvGr 1460 non-null int64
52 KitchenAbvGr 1460 non-null int64
53 KitchenQual 1460 non-null object
54 TotRmsAbvGrd 1460 non-null int64
55 Functional 1460 non-null object
56 Fireplaces 1460 non-null int64
57 FireplaceQu 770 non-null object
58 GarageType 1379 non-null object
59 GarageYrBlt 1379 non-null float64
60 GarageFinish 1379 non-null object
61 GarageCars 1460 non-null int64
62 GarageArea 1460 non-null int64
63 GarageQual 1379 non-null object
64 GarageCond 1379 non-null object
65 PavedDrive 1460 non-null object
66 WoodDeckSF 1460 non-null int64
67 OpenPorchSF 1460 non-null int64
68 EnclosedPorch 1460 non-null int64
69 3SsnPorch 1460 non-null int64
70 ScreenPorch 1460 non-null int64
71 PoolArea 1460 non-null int64
72 PoolQC 7 non-null object
73 Fence 281 non-null object
74 MiscFeature 54 non-null object
75 MiscVal 1460 non-null int64
76 MoSold 1460 non-null int64
77 YrSold 1460 non-null int64
78 SaleType 1460 non-null object
79 SaleCondition 1460 non-null object
80 SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
由上述结果可知,上述数据中"Alley",“PoolQC”,“Fence”,“MiscFeature"4个特征的缺失值较多,而且数据集中特征较多,因此去除这四个特征和无用特征"Id”
train = train.drop(["Id","Alley","PoolQC","Fence","MiscFeature"],axis = 1)
train.head()
MSSubClass | MSZoning | LotFrontage | LotArea | Street | LotShape | LandContour | Utilities | LotConfig | LandSlope | ... | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60 | RL | 65.0 | 8450 | Pave | Reg | Lvl | AllPub | Inside | Gtl | ... | 0 | 0 | 0 | 0 | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 20 | RL | 80.0 | 9600 | Pave | Reg | Lvl | AllPub | FR2 | Gtl | ... | 0 | 0 | 0 | 0 | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 60 | RL | 68.0 | 11250 | Pave | IR1 | Lvl | AllPub | Inside | Gtl | ... | 0 | 0 | 0 | 0 | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 70 | RL | 60.0 | 9550 | Pave | IR1 | Lvl | AllPub | Corner | Gtl | ... | 272 | 0 | 0 | 0 | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 60 | RL | 84.0 | 14260 | Pave | IR1 | Lvl | AllPub | FR2 | Gtl | ... | 0 | 0 | 0 | 0 | 0 | 12 | 2008 | WD | Normal | 250000 |
5 rows × 76 columns
# 含缺失值的特征及缺失值的个数
Missing_value_list = {}
for i in train.columns:
null_count = train[i].isnull().sum()
if null_count > 0:
Missing_value_list[i] = null_count
Missing_value_list
{'LotFrontage': 259,
'MasVnrType': 8,
'MasVnrArea': 8,
'BsmtQual': 37,
'BsmtCond': 37,
'BsmtExposure': 38,
'BsmtFinType1': 37,
'BsmtFinType2': 38,
'Electrical': 1,
'FireplaceQu': 690,
'GarageType': 81,
'GarageYrBlt': 81,
'GarageFinish': 81,
'GarageQual': 81,
'GarageCond': 81}
离散型特征使用众数进行插补;连续型特征使用均值进行插补
MasVnrType,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinType2,Electrical,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageQual,GarageCond
LotFrontage,MasVnrArea
from sklearn.impute import SimpleImputer
import numpy as np
MISSfeature_Discrete_list = ["MasVnrType","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2","Electrical","FireplaceQu","GarageType","GarageYrBlt","GarageFinish","GarageQual","GarageCond"]
MISSfeature_Continuous_list = ["LotFrontage","MasVnrArea"]
# 对离散类型特征进行缺失值处理
# 用众数填补
imputer = SimpleImputer(missing_values=np.nan ,strategy="most_frequent")
for i in MISSfeature_Discrete_list:
imputer.fit(train[[i]])
train[i] = imputer.transform(train[[i]])
print(train[i].isnull().sum()) # 检擦是否还存在缺失值
# 对连续类型特征进行缺失值处理
# 用均值填补
imputer = SimpleImputer(missing_values=np.nan ,strategy="mean")
for i in MISSfeature_Continuous_list:
imputer.fit(train[[i]])
train[i] = imputer.transform(train[[i]])
print(train[i].isnull().sum()) # 检擦是否还存在缺失值
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
#特征列表
feature_list = list(train.columns)
#连续类型特征列表
feature_Continuous = ["LotFrontage","LotArea","YearBuilt","YearRemodAdd","MasVnrArea","BsmtFinSF1","BsmtFinSF2","BsmtUnfSF","TotalBsmtSF",\
"1stFlrSF","2ndFlrSF","LowQualFinSF","GrLivArea","GarageYrBlt","GarageArea","WoodDeckSF","OpenPorchSF","EnclosedPorch",\
"3SsnPorch","ScreenPorch","PoolArea","MiscVal","MoSold","YrSold"]
#离散类型特征列表
for i in feature_Continuous:
feature_list.remove(i)
#离散类型特征列表
feature_Discrete = feature_list
print(feature_Continuous)
print(feature_Discrete)
['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'GarageYrBlt', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']
['MSSubClass', 'MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageCars', 'GarageQual', 'GarageCond', 'PavedDrive', 'SaleType', 'SaleCondition', 'SalePrice']
对离散类型特征进行one-hot编码;对连续类型特征进行Z-score标准化
# Z-score标准化
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler(copy=True)
train_scaled = train
for i in feature_Continuous:
train_scaled[i] = scaler.fit_transform(train[[i]])
train_scaled.head()
MSSubClass | MSZoning | LotFrontage | LotArea | Street | LotShape | LandContour | Utilities | LotConfig | LandSlope | ... | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60 | RL | -0.229372 | -0.207142 | Pave | Reg | Lvl | AllPub | Inside | Gtl | ... | -0.359325 | -0.116339 | -0.270208 | -0.068692 | -0.087688 | -1.599111 | 0.138777 | WD | Normal | 208500 |
1 | 20 | RL | 0.451936 | -0.091886 | Pave | Reg | Lvl | AllPub | FR2 | Gtl | ... | -0.359325 | -0.116339 | -0.270208 | -0.068692 | -0.087688 | -0.489110 | -0.614439 | WD | Normal | 181500 |
2 | 60 | RL | -0.093110 | 0.073480 | Pave | IR1 | Lvl | AllPub | Inside | Gtl | ... | -0.359325 | -0.116339 | -0.270208 | -0.068692 | -0.087688 | 0.990891 | 0.138777 | WD | Normal | 223500 |
3 | 70 | RL | -0.456474 | -0.096897 | Pave | IR1 | Lvl | AllPub | Corner | Gtl | ... | 4.092524 | -0.116339 | -0.270208 | -0.068692 | -0.087688 | -1.599111 | -1.367655 | WD | Abnorml | 140000 |
4 | 60 | RL | 0.633618 | 0.375148 | Pave | IR1 | Lvl | AllPub | FR2 | Gtl | ... | -0.359325 | -0.116339 | -0.270208 | -0.068692 | -0.087688 | 2.100892 | 0.138777 | WD | Normal | 250000 |
5 rows × 76 columns
# 去掉SalePrice
feature_Discrete.remove('SalePrice')
# one-hot编码
train_scaled = pd.get_dummies(train_scaled, columns=feature_Discrete)
train_scaled.head()
LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | ... | SaleType_ConLw | SaleType_New | SaleType_Oth | SaleType_WD | SaleCondition_Abnorml | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.229372 | -0.207142 | 1.050994 | 0.878668 | 0.511418 | 0.575425 | -0.288653 | -0.944591 | -0.459303 | -0.793434 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 0.451936 | -0.091886 | 0.156734 | -0.429577 | -0.574410 | 1.171992 | -0.288653 | -0.641228 | 0.466465 | 0.257140 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | -0.093110 | 0.073480 | 0.984752 | 0.830215 | 0.323060 | 0.092907 | -0.288653 | -0.301643 | -0.313369 | -0.627826 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
3 | -0.456474 | -0.096897 | -1.863632 | -0.720298 | -0.574410 | -0.499274 | -0.288653 | -0.061670 | -0.687324 | -0.521734 | ... | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
4 | 0.633618 | 0.375148 | 0.951632 | 0.733308 | 1.364570 | 0.463568 | -0.288653 | -0.174865 | 0.199680 | -0.045611 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 rows × 345 columns
#将标签与特征分离
X = train_scaled.drop('SalePrice', axis=1)
y = train_scaled['SalePrice']
#划分训练集与测试集,训练集80%,测试集20%
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=255)
from sklearn.ensemble import BaggingRegressor
# 岭回归
from sklearn.linear_model import Ridge
RI = Ridge()
# KN近邻
from sklearn.neighbors import KNeighborsRegressor
KNR = KNeighborsRegressor()
# 决策树
from sklearn.tree import DecisionTreeRegressor
DTR = DecisionTreeRegressor(random_state=10)
estimator_list = [RI, KNR, DTR]
grid_n = [10, 20, 50, 100, 150, 200, 500]
grid_fea = [True, False]
# bagging回归参数调优
for i in grid_n:
for j in grid_fea:
bagr = BaggingRegressor(base_estimator=DTR, n_estimators=i, bootstrap_features=j, random_state=10)
bagr.fit(X_train,y_train)
print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.7840
10,False,r2:0.7852
20,True,r2:0.7949
20,False,r2:0.8139
50,True,r2:0.8038
50,False,r2:0.8190
100,True,r2:0.8129
100,False,r2:0.8170
150,True,r2:0.8114
150,False,r2:0.8155
200,True,r2:0.8157
200,False,r2:0.8167
500,True,r2:0.8179
500,False,r2:0.8142
由此可知使用bagging回归模型时,在n_estimators = 50,bootstrap_features=False时,效果最佳,此时的基学习器为决策树,r2 = 0.819;
下面讨论基学习器为KN近邻回归和岭回归时的拟合效果:
# 基学习器为KNR
for i in grid_n:
for j in grid_fea:
bagr = BaggingRegressor(base_estimator=KNR, n_estimators=i, bootstrap_features=j, random_state=10)
bagr.fit(X_train,y_train)
print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.7347
10,False,r2:0.7245
20,True,r2:0.7282
20,False,r2:0.7271
50,True,r2:0.7346
50,False,r2:0.7264
100,True,r2:0.7386
100,False,r2:0.7233
150,True,r2:0.7362
150,False,r2:0.7241
200,True,r2:0.7356
200,False,r2:0.7231
500,True,r2:0.7375
500,False,r2:0.7250
通过运行结果可知,基学习器为KN近邻回归模型时,整体的拟合效果低于基学习器为决策树时的效果;
# 基学习器为RI
for i in grid_n:
for j in grid_fea:
bagr = BaggingRegressor(base_estimator=RI, n_estimators=i, bootstrap_features=j, random_state=10)
bagr.fit(X_train,y_train)
print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.8758
10,False,r2:0.8701
20,True,r2:0.8790
20,False,r2:0.8803
50,True,r2:0.8737
50,False,r2:0.8752
100,True,r2:0.8724
100,False,r2:0.8711
150,True,r2:0.8718
150,False,r2:0.8717
200,True,r2:0.8727
200,False,r2:0.8735
500,True,r2:0.8732
500,False,r2:0.8721
与上面两种基学习器相比,可以明显看出,基学习器为岭回归模型时效果最佳;在n_estimators = 20,bootstrap_features=False时,效果最佳,此时r2 = 0.8803。
from sklearn.ensemble import RandomForestRegressor
# 随机森林参数调优
criterion_list = ["mse", "mae"]
max_features_list =["auto", "sqrt", "log2"]
for i in grid_n:
for j in criterion_list:
for k in max_features_list:
RF = RandomForestRegressor(n_estimators=i, criterion=j,max_features=k,random_state=10)
RF.fit(X_train, y_train)
print("%d,%s,%s,r2: %0.4f" %(i,j,k,RF.score(X_test,y_test)))
10,mse,auto,r2: 0.8125
10,mse,sqrt,r2: 0.7479
10,mse,log2,r2: 0.7499
10,mae,auto,r2: 0.7934
10,mae,sqrt,r2: 0.7406
10,mae,log2,r2: 0.7283
20,mse,auto,r2: 0.8242
20,mse,sqrt,r2: 0.7864
20,mse,log2,r2: 0.7498
20,mae,auto,r2: 0.8187
20,mae,sqrt,r2: 0.7566
20,mae,log2,r2: 0.7437
50,mse,auto,r2: 0.8173
50,mse,sqrt,r2: 0.7923
50,mse,log2,r2: 0.7518
50,mae,auto,r2: 0.8122
50,mae,sqrt,r2: 0.7752
50,mae,log2,r2: 0.7468
100,mse,auto,r2: 0.8187
100,mse,sqrt,r2: 0.7912
100,mse,log2,r2: 0.7477
100,mae,auto,r2: 0.8119
100,mae,sqrt,r2: 0.7731
100,mae,log2,r2: 0.7461
150,mse,auto,r2: 0.8199
150,mse,sqrt,r2: 0.7834
150,mse,log2,r2: 0.7513
150,mae,auto,r2: 0.8111
150,mae,sqrt,r2: 0.7725
150,mae,log2,r2: 0.7463
200,mse,auto,r2: 0.8178
200,mse,sqrt,r2: 0.7817
200,mse,log2,r2: 0.7495
200,mae,auto,r2: 0.8060
200,mae,sqrt,r2: 0.7742
200,mae,log2,r2: 0.7491
500,mse,auto,r2: 0.8170
500,mse,sqrt,r2: 0.7826
500,mse,log2,r2: 0.7454
500,mae,auto,r2: 0.8042
500,mae,sqrt,r2: 0.7748
500,mae,log2,r2: 0.7467
由上述结果可知,随机森林在n_estimators=20,criterion=mse,max_features=auto时效果最佳,此时r2= 0.8242。
from sklearn.ensemble import AdaBoostRegressor
#在基学习器个数为默认50,学习率默认为1的情况下,观测岭回归,KN近邻,决策树三个基学习器的拟合拟合效果
for i in estimator_list:
ABR = AdaBoostRegressor(base_estimator=i, random_state=10)
ABR.fit(X_train, y_train)
print("%s,r2: %0.4f" %(i, ABR.score(X_test,y_test)))
Ridge(),r2: 0.7018
KNeighborsRegressor(),r2: 0.6619
DecisionTreeRegressor(random_state=10),r2: 0.8438
从上面的结果可以看出,决策树为基学习器时,AdaBoost回归模型的效果最佳;下面以决策树为基学习器,对n_estimators,loss, learning_rate超参数进行调优
loss_list = ['linear', 'square', 'exponential']
random_state = np.arange(0.1,1.1,0.1)
for i in grid_n:
for j in loss_list:
for k in random_state:
ABR = AdaBoostRegressor(base_estimator=DTR, random_state=10,n_estimators = i, loss = j, learning_rate =k )
ABR.fit(X_train, y_train)
print("n_estimators:%d, loss:%s, learning_rate:%f, r2: %0.4f" %(i,j,k, ABR.score(X_test,y_test)))
n_estimators:10, loss:linear, learning_rate:0.100000, r2: 0.7986
n_estimators:10, loss:linear, learning_rate:0.200000, r2: 0.8138
n_estimators:10, loss:linear, learning_rate:0.300000, r2: 0.7897
n_estimators:10, loss:linear, learning_rate:0.400000, r2: 0.7993
n_estimators:10, loss:linear, learning_rate:0.500000, r2: 0.8016
n_estimators:10, loss:linear, learning_rate:0.600000, r2: 0.8145
n_estimators:10, loss:linear, learning_rate:0.700000, r2: 0.8190
n_estimators:10, loss:linear, learning_rate:0.800000, r2: 0.8338
n_estimators:10, loss:linear, learning_rate:0.900000, r2: 0.8147
n_estimators:10, loss:linear, learning_rate:1.000000, r2: 0.8363
n_estimators:10, loss:square, learning_rate:0.100000, r2: 0.8443
n_estimators:10, loss:square, learning_rate:0.200000, r2: 0.7998
n_estimators:10, loss:square, learning_rate:0.300000, r2: 0.7877
n_estimators:10, loss:square, learning_rate:0.400000, r2: 0.8167
n_estimators:10, loss:square, learning_rate:0.500000, r2: 0.8348
n_estimators:10, loss:square, learning_rate:0.600000, r2: 0.8096
n_estimators:10, loss:square, learning_rate:0.700000, r2: 0.8165
n_estimators:10, loss:square, learning_rate:0.800000, r2: 0.8002
n_estimators:10, loss:square, learning_rate:0.900000, r2: 0.8050
n_estimators:10, loss:square, learning_rate:1.000000, r2: 0.8323
n_estimators:10, loss:exponential, learning_rate:0.100000, r2: 0.8459
n_estimators:10, loss:exponential, learning_rate:0.200000, r2: 0.8153
n_estimators:10, loss:exponential, learning_rate:0.300000, r2: 0.8226
n_estimators:10, loss:exponential, learning_rate:0.400000, r2: 0.8005
n_estimators:10, loss:exponential, learning_rate:0.500000, r2: 0.7978
n_estimators:10, loss:exponential, learning_rate:0.600000, r2: 0.8466
n_estimators:10, loss:exponential, learning_rate:0.700000, r2: 0.7934
n_estimators:10, loss:exponential, learning_rate:0.800000, r2: 0.8502
n_estimators:10, loss:exponential, learning_rate:0.900000, r2: 0.8190
n_estimators:10, loss:exponential, learning_rate:1.000000, r2: 0.8327
n_estimators:20, loss:linear, learning_rate:0.100000, r2: 0.8007
n_estimators:20, loss:linear, learning_rate:0.200000, r2: 0.8267
n_estimators:20, loss:linear, learning_rate:0.300000, r2: 0.8089
n_estimators:20, loss:linear, learning_rate:0.400000, r2: 0.8231
n_estimators:20, loss:linear, learning_rate:0.500000, r2: 0.8166
n_estimators:20, loss:linear, learning_rate:0.600000, r2: 0.8202
n_estimators:20, loss:linear, learning_rate:0.700000, r2: 0.8381
n_estimators:20, loss:linear, learning_rate:0.800000, r2: 0.8354
n_estimators:20, loss:linear, learning_rate:0.900000, r2: 0.8246
n_estimators:20, loss:linear, learning_rate:1.000000, r2: 0.8385
n_estimators:20, loss:square, learning_rate:0.100000, r2: 0.8158
n_estimators:20, loss:square, learning_rate:0.200000, r2: 0.8226
n_estimators:20, loss:square, learning_rate:0.300000, r2: 0.8476
n_estimators:20, loss:square, learning_rate:0.400000, r2: 0.8136
n_estimators:20, loss:square, learning_rate:0.500000, r2: 0.8375
n_estimators:20, loss:square, learning_rate:0.600000, r2: 0.8140
n_estimators:20, loss:square, learning_rate:0.700000, r2: 0.7956
n_estimators:20, loss:square, learning_rate:0.800000, r2: 0.8172
n_estimators:20, loss:square, learning_rate:0.900000, r2: 0.8247
n_estimators:20, loss:square, learning_rate:1.000000, r2: 0.8166
n_estimators:20, loss:exponential, learning_rate:0.100000, r2: 0.8393
n_estimators:20, loss:exponential, learning_rate:0.200000, r2: 0.8155
n_estimators:20, loss:exponential, learning_rate:0.300000, r2: 0.8439
n_estimators:20, loss:exponential, learning_rate:0.400000, r2: 0.8108
n_estimators:20, loss:exponential, learning_rate:0.500000, r2: 0.8094
n_estimators:20, loss:exponential, learning_rate:0.600000, r2: 0.8170
n_estimators:20, loss:exponential, learning_rate:0.700000, r2: 0.8265
n_estimators:20, loss:exponential, learning_rate:0.800000, r2: 0.8428
n_estimators:20, loss:exponential, learning_rate:0.900000, r2: 0.8395
n_estimators:20, loss:exponential, learning_rate:1.000000, r2: 0.8318
n_estimators:50, loss:linear, learning_rate:0.100000, r2: 0.8240
n_estimators:50, loss:linear, learning_rate:0.200000, r2: 0.8376
n_estimators:50, loss:linear, learning_rate:0.300000, r2: 0.8409
n_estimators:50, loss:linear, learning_rate:0.400000, r2: 0.8404
n_estimators:50, loss:linear, learning_rate:0.500000, r2: 0.8386
n_estimators:50, loss:linear, learning_rate:0.600000, r2: 0.8156
n_estimators:50, loss:linear, learning_rate:0.700000, r2: 0.8287
n_estimators:50, loss:linear, learning_rate:0.800000, r2: 0.8234
n_estimators:50, loss:linear, learning_rate:0.900000, r2: 0.8207
n_estimators:50, loss:linear, learning_rate:1.000000, r2: 0.8438
n_estimators:50, loss:square, learning_rate:0.100000, r2: 0.8208
n_estimators:50, loss:square, learning_rate:0.200000, r2: 0.8429
n_estimators:50, loss:square, learning_rate:0.300000, r2: 0.8424
n_estimators:50, loss:square, learning_rate:0.400000, r2: 0.8274
n_estimators:50, loss:square, learning_rate:0.500000, r2: 0.8382
n_estimators:50, loss:square, learning_rate:0.600000, r2: 0.8182
n_estimators:50, loss:square, learning_rate:0.700000, r2: 0.8189
n_estimators:50, loss:square, learning_rate:0.800000, r2: 0.8206
n_estimators:50, loss:square, learning_rate:0.900000, r2: 0.8338
n_estimators:50, loss:square, learning_rate:1.000000, r2: 0.8210
n_estimators:50, loss:exponential, learning_rate:0.100000, r2: 0.8308
n_estimators:50, loss:exponential, learning_rate:0.200000, r2: 0.8263
n_estimators:50, loss:exponential, learning_rate:0.300000, r2: 0.8393
n_estimators:50, loss:exponential, learning_rate:0.400000, r2: 0.8242
n_estimators:50, loss:exponential, learning_rate:0.500000, r2: 0.8238
n_estimators:50, loss:exponential, learning_rate:0.600000, r2: 0.8297
n_estimators:50, loss:exponential, learning_rate:0.700000, r2: 0.8323
n_estimators:50, loss:exponential, learning_rate:0.800000, r2: 0.8397
n_estimators:50, loss:exponential, learning_rate:0.900000, r2: 0.8164
n_estimators:50, loss:exponential, learning_rate:1.000000, r2: 0.8310
n_estimators:100, loss:linear, learning_rate:0.100000, r2: 0.8288
n_estimators:100, loss:linear, learning_rate:0.200000, r2: 0.8393
n_estimators:100, loss:linear, learning_rate:0.300000, r2: 0.8318
n_estimators:100, loss:linear, learning_rate:0.400000, r2: 0.8360
n_estimators:100, loss:linear, learning_rate:0.500000, r2: 0.8417
n_estimators:100, loss:linear, learning_rate:0.600000, r2: 0.8286
n_estimators:100, loss:linear, learning_rate:0.700000, r2: 0.8283
n_estimators:100, loss:linear, learning_rate:0.800000, r2: 0.8280
n_estimators:100, loss:linear, learning_rate:0.900000, r2: 0.8292
n_estimators:100, loss:linear, learning_rate:1.000000, r2: 0.8415
n_estimators:100, loss:square, learning_rate:0.100000, r2: 0.8329
n_estimators:100, loss:square, learning_rate:0.200000, r2: 0.8331
n_estimators:100, loss:square, learning_rate:0.300000, r2: 0.8443
n_estimators:100, loss:square, learning_rate:0.400000, r2: 0.8334
n_estimators:100, loss:square, learning_rate:0.500000, r2: 0.8434
n_estimators:100, loss:square, learning_rate:0.600000, r2: 0.8286
n_estimators:100, loss:square, learning_rate:0.700000, r2: 0.8191
n_estimators:100, loss:square, learning_rate:0.800000, r2: 0.8155
n_estimators:100, loss:square, learning_rate:0.900000, r2: 0.8352
n_estimators:100, loss:square, learning_rate:1.000000, r2: 0.8295
n_estimators:100, loss:exponential, learning_rate:0.100000, r2: 0.8248
n_estimators:100, loss:exponential, learning_rate:0.200000, r2: 0.8277
n_estimators:100, loss:exponential, learning_rate:0.300000, r2: 0.8386
n_estimators:100, loss:exponential, learning_rate:0.400000, r2: 0.8224
n_estimators:100, loss:exponential, learning_rate:0.500000, r2: 0.8280
n_estimators:100, loss:exponential, learning_rate:0.600000, r2: 0.8256
n_estimators:100, loss:exponential, learning_rate:0.700000, r2: 0.8437
n_estimators:100, loss:exponential, learning_rate:0.800000, r2: 0.8415
n_estimators:100, loss:exponential, learning_rate:0.900000, r2: 0.8234
n_estimators:100, loss:exponential, learning_rate:1.000000, r2: 0.8329
n_estimators:150, loss:linear, learning_rate:0.100000, r2: 0.8282
n_estimators:150, loss:linear, learning_rate:0.200000, r2: 0.8387
n_estimators:150, loss:linear, learning_rate:0.300000, r2: 0.8374
n_estimators:150, loss:linear, learning_rate:0.400000, r2: 0.8409
n_estimators:150, loss:linear, learning_rate:0.500000, r2: 0.8401
n_estimators:150, loss:linear, learning_rate:0.600000, r2: 0.8317
n_estimators:150, loss:linear, learning_rate:0.700000, r2: 0.8302
n_estimators:150, loss:linear, learning_rate:0.800000, r2: 0.8346
n_estimators:150, loss:linear, learning_rate:0.900000, r2: 0.8341
n_estimators:150, loss:linear, learning_rate:1.000000, r2: 0.8403
n_estimators:150, loss:square, learning_rate:0.100000, r2: 0.8413
n_estimators:150, loss:square, learning_rate:0.200000, r2: 0.8341
n_estimators:150, loss:square, learning_rate:0.300000, r2: 0.8449
n_estimators:150, loss:square, learning_rate:0.400000, r2: 0.8291
n_estimators:150, loss:square, learning_rate:0.500000, r2: 0.8429
n_estimators:150, loss:square, learning_rate:0.600000, r2: 0.8338
n_estimators:150, loss:square, learning_rate:0.700000, r2: 0.8173
n_estimators:150, loss:square, learning_rate:0.800000, r2: 0.8238
n_estimators:150, loss:square, learning_rate:0.900000, r2: 0.8365
n_estimators:150, loss:square, learning_rate:1.000000, r2: 0.8316
n_estimators:150, loss:exponential, learning_rate:0.100000, r2: 0.8278
n_estimators:150, loss:exponential, learning_rate:0.200000, r2: 0.8303
n_estimators:150, loss:exponential, learning_rate:0.300000, r2: 0.8399
n_estimators:150, loss:exponential, learning_rate:0.400000, r2: 0.8220
n_estimators:150, loss:exponential, learning_rate:0.500000, r2: 0.8353
n_estimators:150, loss:exponential, learning_rate:0.600000, r2: 0.8372
n_estimators:150, loss:exponential, learning_rate:0.700000, r2: 0.8419
n_estimators:150, loss:exponential, learning_rate:0.800000, r2: 0.8349
n_estimators:150, loss:exponential, learning_rate:0.900000, r2: 0.8329
n_estimators:150, loss:exponential, learning_rate:1.000000, r2: 0.8295
n_estimators:200, loss:linear, learning_rate:0.100000, r2: 0.8316
n_estimators:200, loss:linear, learning_rate:0.200000, r2: 0.8390
n_estimators:200, loss:linear, learning_rate:0.300000, r2: 0.8391
n_estimators:200, loss:linear, learning_rate:0.400000, r2: 0.8375
n_estimators:200, loss:linear, learning_rate:0.500000, r2: 0.8396
n_estimators:200, loss:linear, learning_rate:0.600000, r2: 0.8315
n_estimators:200, loss:linear, learning_rate:0.700000, r2: 0.8354
n_estimators:200, loss:linear, learning_rate:0.800000, r2: 0.8311
n_estimators:200, loss:linear, learning_rate:0.900000, r2: 0.8299
n_estimators:200, loss:linear, learning_rate:1.000000, r2: 0.8408
n_estimators:200, loss:square, learning_rate:0.100000, r2: 0.8384
n_estimators:200, loss:square, learning_rate:0.200000, r2: 0.8366
n_estimators:200, loss:square, learning_rate:0.300000, r2: 0.8421
n_estimators:200, loss:square, learning_rate:0.400000, r2: 0.8293
n_estimators:200, loss:square, learning_rate:0.500000, r2: 0.8380
n_estimators:200, loss:square, learning_rate:0.600000, r2: 0.8337
n_estimators:200, loss:square, learning_rate:0.700000, r2: 0.8229
n_estimators:200, loss:square, learning_rate:0.800000, r2: 0.8259
n_estimators:200, loss:square, learning_rate:0.900000, r2: 0.8353
n_estimators:200, loss:square, learning_rate:1.000000, r2: 0.8333
n_estimators:200, loss:exponential, learning_rate:0.100000, r2: 0.8366
n_estimators:200, loss:exponential, learning_rate:0.200000, r2: 0.8288
n_estimators:200, loss:exponential, learning_rate:0.300000, r2: 0.8396
n_estimators:200, loss:exponential, learning_rate:0.400000, r2: 0.8261
n_estimators:200, loss:exponential, learning_rate:0.500000, r2: 0.8330
n_estimators:200, loss:exponential, learning_rate:0.600000, r2: 0.8274
n_estimators:200, loss:exponential, learning_rate:0.700000, r2: 0.8409
n_estimators:200, loss:exponential, learning_rate:0.800000, r2: 0.8331
n_estimators:200, loss:exponential, learning_rate:0.900000, r2: 0.8292
n_estimators:200, loss:exponential, learning_rate:1.000000, r2: 0.8309
n_estimators:500, loss:linear, learning_rate:0.100000, r2: 0.8353
n_estimators:500, loss:linear, learning_rate:0.200000, r2: 0.8389
n_estimators:500, loss:linear, learning_rate:0.300000, r2: 0.8325
n_estimators:500, loss:linear, learning_rate:0.400000, r2: 0.8413
n_estimators:500, loss:linear, learning_rate:0.500000, r2: 0.8411
n_estimators:500, loss:linear, learning_rate:0.600000, r2: 0.8374
n_estimators:500, loss:linear, learning_rate:0.700000, r2: 0.8365
n_estimators:500, loss:linear, learning_rate:0.800000, r2: 0.8342
n_estimators:500, loss:linear, learning_rate:0.900000, r2: 0.8306
n_estimators:500, loss:linear, learning_rate:1.000000, r2: 0.8367
n_estimators:500, loss:square, learning_rate:0.100000, r2: 0.8364
n_estimators:500, loss:square, learning_rate:0.200000, r2: 0.8361
n_estimators:500, loss:square, learning_rate:0.300000, r2: 0.8441
n_estimators:500, loss:square, learning_rate:0.400000, r2: 0.8321
n_estimators:500, loss:square, learning_rate:0.500000, r2: 0.8325
n_estimators:500, loss:square, learning_rate:0.600000, r2: 0.8399
n_estimators:500, loss:square, learning_rate:0.700000, r2: 0.8320
n_estimators:500, loss:square, learning_rate:0.800000, r2: 0.8312
n_estimators:500, loss:square, learning_rate:0.900000, r2: 0.8285
n_estimators:500, loss:square, learning_rate:1.000000, r2: 0.8270
n_estimators:500, loss:exponential, learning_rate:0.100000, r2: 0.8407
n_estimators:500, loss:exponential, learning_rate:0.200000, r2: 0.8306
n_estimators:500, loss:exponential, learning_rate:0.300000, r2: 0.8349
n_estimators:500, loss:exponential, learning_rate:0.400000, r2: 0.8300
n_estimators:500, loss:exponential, learning_rate:0.500000, r2: 0.8359
n_estimators:500, loss:exponential, learning_rate:0.600000, r2: 0.8338
n_estimators:500, loss:exponential, learning_rate:0.700000, r2: 0.8376
n_estimators:500, loss:exponential, learning_rate:0.800000, r2: 0.8310
n_estimators:500, loss:exponential, learning_rate:0.900000, r2: 0.8294
n_estimators:500, loss:exponential, learning_rate:1.000000, r2: 0.8308
由上示结果知,AdaBoost以决策树为基学习器在n_estimators=20, loss=square, learning_rate=0.300000, 有最佳拟合效果,此时r2=0.8476。
from sklearn.ensemble import GradientBoostingRegressor
loss_list = ['ls', 'lad', 'huber', 'quantile']
learning_rate_list = np.arange(0.1,1.1,0.1)
for i in grid_n:
for j in loss_list:
for k in learning_rate_list:
GBR = GradientBoostingRegressor(loss=j, learning_rate=k, n_estimators=i,random_state=10)
GBR.fit(X_train, y_train)
print("n_estimators:%d, loss:%s, learning_rate:%0.2f, r2: %0.4f" %(i,j,k, GBR.score(X_test,y_test)))
n_estimators:10, loss:ls, learning_rate:0.10, r2: 0.5922
n_estimators:10, loss:ls, learning_rate:0.20, r2: 0.7456
n_estimators:10, loss:ls, learning_rate:0.30, r2: 0.7815
n_estimators:10, loss:ls, learning_rate:0.40, r2: 0.7385
n_estimators:10, loss:ls, learning_rate:0.50, r2: 0.7016
n_estimators:10, loss:ls, learning_rate:0.60, r2: 0.7388
n_estimators:10, loss:ls, learning_rate:0.70, r2: 0.6672
n_estimators:10, loss:ls, learning_rate:0.80, r2: 0.7304
n_estimators:10, loss:ls, learning_rate:0.90, r2: 0.7114
n_estimators:10, loss:ls, learning_rate:1.00, r2: 0.5668
n_estimators:10, loss:lad, learning_rate:0.10, r2: 0.3481
n_estimators:10, loss:lad, learning_rate:0.20, r2: 0.5562
n_estimators:10, loss:lad, learning_rate:0.30, r2: 0.6521
n_estimators:10, loss:lad, learning_rate:0.40, r2: 0.6943
n_estimators:10, loss:lad, learning_rate:0.50, r2: 0.7284
n_estimators:10, loss:lad, learning_rate:0.60, r2: 0.7315
n_estimators:10, loss:lad, learning_rate:0.70, r2: 0.7139
n_estimators:10, loss:lad, learning_rate:0.80, r2: 0.7172
n_estimators:10, loss:lad, learning_rate:0.90, r2: 0.6893
n_estimators:10, loss:lad, learning_rate:1.00, r2: 0.6687
n_estimators:10, loss:huber, learning_rate:0.10, r2: 0.4641
n_estimators:10, loss:huber, learning_rate:0.20, r2: 0.6158
n_estimators:10, loss:huber, learning_rate:0.30, r2: 0.7108
n_estimators:10, loss:huber, learning_rate:0.40, r2: 0.7433
n_estimators:10, loss:huber, learning_rate:0.50, r2: 0.7594
n_estimators:10, loss:huber, learning_rate:0.60, r2: 0.7780
n_estimators:10, loss:huber, learning_rate:0.70, r2: 0.7680
n_estimators:10, loss:huber, learning_rate:0.80, r2: 0.7657
n_estimators:10, loss:huber, learning_rate:0.90, r2: 0.7181
n_estimators:10, loss:huber, learning_rate:1.00, r2: 0.7399
n_estimators:10, loss:quantile, learning_rate:0.10, r2: -0.0321
n_estimators:10, loss:quantile, learning_rate:0.20, r2: 0.4116
n_estimators:10, loss:quantile, learning_rate:0.30, r2: 0.5778
n_estimators:10, loss:quantile, learning_rate:0.40, r2: 0.6326
n_estimators:10, loss:quantile, learning_rate:0.50, r2: 0.6443
n_estimators:10, loss:quantile, learning_rate:0.60, r2: 0.6106
n_estimators:10, loss:quantile, learning_rate:0.70, r2: 0.5563
n_estimators:10, loss:quantile, learning_rate:0.80, r2: 0.6072
n_estimators:10, loss:quantile, learning_rate:0.90, r2: 0.5383
n_estimators:10, loss:quantile, learning_rate:1.00, r2: 0.3802
n_estimators:20, loss:ls, learning_rate:0.10, r2: 0.7016
n_estimators:20, loss:ls, learning_rate:0.20, r2: 0.7950
n_estimators:20, loss:ls, learning_rate:0.30, r2: 0.8140
n_estimators:20, loss:ls, learning_rate:0.40, r2: 0.7735
n_estimators:20, loss:ls, learning_rate:0.50, r2: 0.7282
n_estimators:20, loss:ls, learning_rate:0.60, r2: 0.7545
n_estimators:20, loss:ls, learning_rate:0.70, r2: 0.6554
n_estimators:20, loss:ls, learning_rate:0.80, r2: 0.7442
n_estimators:20, loss:ls, learning_rate:0.90, r2: 0.7313
n_estimators:20, loss:ls, learning_rate:1.00, r2: 0.5410
n_estimators:20, loss:lad, learning_rate:0.10, r2: 0.5256
n_estimators:20, loss:lad, learning_rate:0.20, r2: 0.6928
n_estimators:20, loss:lad, learning_rate:0.30, r2: 0.7250
n_estimators:20, loss:lad, learning_rate:0.40, r2: 0.7417
n_estimators:20, loss:lad, learning_rate:0.50, r2: 0.7658
n_estimators:20, loss:lad, learning_rate:0.60, r2: 0.7708
n_estimators:20, loss:lad, learning_rate:0.70, r2: 0.7294
n_estimators:20, loss:lad, learning_rate:0.80, r2: 0.7480
n_estimators:20, loss:lad, learning_rate:0.90, r2: 0.7155
n_estimators:20, loss:lad, learning_rate:1.00, r2: 0.6645
n_estimators:20, loss:huber, learning_rate:0.10, r2: 0.6210
n_estimators:20, loss:huber, learning_rate:0.20, r2: 0.7313
n_estimators:20, loss:huber, learning_rate:0.30, r2: 0.7794
n_estimators:20, loss:huber, learning_rate:0.40, r2: 0.8207
n_estimators:20, loss:huber, learning_rate:0.50, r2: 0.8415
n_estimators:20, loss:huber, learning_rate:0.60, r2: 0.7974
n_estimators:20, loss:huber, learning_rate:0.70, r2: 0.8105
n_estimators:20, loss:huber, learning_rate:0.80, r2: 0.6530
n_estimators:20, loss:huber, learning_rate:0.90, r2: 0.7436
n_estimators:20, loss:huber, learning_rate:1.00, r2: 0.8018
n_estimators:20, loss:quantile, learning_rate:0.10, r2: 0.3999
n_estimators:20, loss:quantile, learning_rate:0.20, r2: 0.6646
n_estimators:20, loss:quantile, learning_rate:0.30, r2: 0.6932
n_estimators:20, loss:quantile, learning_rate:0.40, r2: 0.6944
n_estimators:20, loss:quantile, learning_rate:0.50, r2: 0.6748
n_estimators:20, loss:quantile, learning_rate:0.60, r2: 0.6579
n_estimators:20, loss:quantile, learning_rate:0.70, r2: 0.5760
n_estimators:20, loss:quantile, learning_rate:0.80, r2: 0.6285
n_estimators:20, loss:quantile, learning_rate:0.90, r2: 0.5640
n_estimators:20, loss:quantile, learning_rate:1.00, r2: 0.4109
n_estimators:50, loss:ls, learning_rate:0.10, r2: 0.7640
n_estimators:50, loss:ls, learning_rate:0.20, r2: 0.8337
n_estimators:50, loss:ls, learning_rate:0.30, r2: 0.8331
n_estimators:50, loss:ls, learning_rate:0.40, r2: 0.7908
n_estimators:50, loss:ls, learning_rate:0.50, r2: 0.7318
n_estimators:50, loss:ls, learning_rate:0.60, r2: 0.7638
n_estimators:50, loss:ls, learning_rate:0.70, r2: 0.6778
n_estimators:50, loss:ls, learning_rate:0.80, r2: 0.7654
n_estimators:50, loss:ls, learning_rate:0.90, r2: 0.7321
n_estimators:50, loss:ls, learning_rate:1.00, r2: 0.5572
n_estimators:50, loss:lad, learning_rate:0.10, r2: 0.6928
n_estimators:50, loss:lad, learning_rate:0.20, r2: 0.7723
n_estimators:50, loss:lad, learning_rate:0.30, r2: 0.7726
n_estimators:50, loss:lad, learning_rate:0.40, r2: 0.7695
n_estimators:50, loss:lad, learning_rate:0.50, r2: 0.8229
n_estimators:50, loss:lad, learning_rate:0.60, r2: 0.7921
n_estimators:50, loss:lad, learning_rate:0.70, r2: 0.6918
n_estimators:50, loss:lad, learning_rate:0.80, r2: 0.7345
n_estimators:50, loss:lad, learning_rate:0.90, r2: 0.7104
n_estimators:50, loss:lad, learning_rate:1.00, r2: 0.6775
n_estimators:50, loss:huber, learning_rate:0.10, r2: 0.7619
n_estimators:50, loss:huber, learning_rate:0.20, r2: 0.8040
n_estimators:50, loss:huber, learning_rate:0.30, r2: 0.8124
n_estimators:50, loss:huber, learning_rate:0.40, r2: 0.8486
n_estimators:50, loss:huber, learning_rate:0.50, r2: 0.8557
n_estimators:50, loss:huber, learning_rate:0.60, r2: 0.8052
n_estimators:50, loss:huber, learning_rate:0.70, r2: 0.8166
n_estimators:50, loss:huber, learning_rate:0.80, r2: 0.6511
n_estimators:50, loss:huber, learning_rate:0.90, r2: 0.7323
n_estimators:50, loss:huber, learning_rate:1.00, r2: 0.7960
n_estimators:50, loss:quantile, learning_rate:0.10, r2: 0.7164
n_estimators:50, loss:quantile, learning_rate:0.20, r2: 0.7633
n_estimators:50, loss:quantile, learning_rate:0.30, r2: 0.7323
n_estimators:50, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:50, loss:quantile, learning_rate:0.50, r2: 0.7002
n_estimators:50, loss:quantile, learning_rate:0.60, r2: 0.6667
n_estimators:50, loss:quantile, learning_rate:0.70, r2: 0.5759
n_estimators:50, loss:quantile, learning_rate:0.80, r2: 0.6384
n_estimators:50, loss:quantile, learning_rate:0.90, r2: 0.5985
n_estimators:50, loss:quantile, learning_rate:1.00, r2: 0.3767
n_estimators:100, loss:ls, learning_rate:0.10, r2: 0.7833
n_estimators:100, loss:ls, learning_rate:0.20, r2: 0.8461
n_estimators:100, loss:ls, learning_rate:0.30, r2: 0.8299
n_estimators:100, loss:ls, learning_rate:0.40, r2: 0.8011
n_estimators:100, loss:ls, learning_rate:0.50, r2: 0.7359
n_estimators:100, loss:ls, learning_rate:0.60, r2: 0.7714
n_estimators:100, loss:ls, learning_rate:0.70, r2: 0.6857
n_estimators:100, loss:ls, learning_rate:0.80, r2: 0.7741
n_estimators:100, loss:ls, learning_rate:0.90, r2: 0.7213
n_estimators:100, loss:ls, learning_rate:1.00, r2: 0.5668
n_estimators:100, loss:lad, learning_rate:0.10, r2: 0.7420
n_estimators:100, loss:lad, learning_rate:0.20, r2: 0.8070
n_estimators:100, loss:lad, learning_rate:0.30, r2: 0.7795
n_estimators:100, loss:lad, learning_rate:0.40, r2: 0.7834
n_estimators:100, loss:lad, learning_rate:0.50, r2: 0.8223
n_estimators:100, loss:lad, learning_rate:0.60, r2: 0.7918
n_estimators:100, loss:lad, learning_rate:0.70, r2: 0.7206
n_estimators:100, loss:lad, learning_rate:0.80, r2: 0.7338
n_estimators:100, loss:lad, learning_rate:0.90, r2: 0.6832
n_estimators:100, loss:lad, learning_rate:1.00, r2: 0.6891
n_estimators:100, loss:huber, learning_rate:0.10, r2: 0.8063
n_estimators:100, loss:huber, learning_rate:0.20, r2: 0.8199
n_estimators:100, loss:huber, learning_rate:0.30, r2: 0.8149
n_estimators:100, loss:huber, learning_rate:0.40, r2: 0.8575
n_estimators:100, loss:huber, learning_rate:0.50, r2: 0.8687
n_estimators:100, loss:huber, learning_rate:0.60, r2: 0.7969
n_estimators:100, loss:huber, learning_rate:0.70, r2: 0.8283
n_estimators:100, loss:huber, learning_rate:0.80, r2: 0.6596
n_estimators:100, loss:huber, learning_rate:0.90, r2: 0.7267
n_estimators:100, loss:huber, learning_rate:1.00, r2: 0.7902
n_estimators:100, loss:quantile, learning_rate:0.10, r2: 0.7818
n_estimators:100, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:100, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:100, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:100, loss:quantile, learning_rate:0.50, r2: 0.7022
n_estimators:100, loss:quantile, learning_rate:0.60, r2: 0.6812
n_estimators:100, loss:quantile, learning_rate:0.70, r2: 0.6050
n_estimators:100, loss:quantile, learning_rate:0.80, r2: 0.6532
n_estimators:100, loss:quantile, learning_rate:0.90, r2: 0.5779
n_estimators:100, loss:quantile, learning_rate:1.00, r2: 0.3861
n_estimators:150, loss:ls, learning_rate:0.10, r2: 0.7919
n_estimators:150, loss:ls, learning_rate:0.20, r2: 0.8462
n_estimators:150, loss:ls, learning_rate:0.30, r2: 0.8344
n_estimators:150, loss:ls, learning_rate:0.40, r2: 0.8044
n_estimators:150, loss:ls, learning_rate:0.50, r2: 0.7386
n_estimators:150, loss:ls, learning_rate:0.60, r2: 0.7721
n_estimators:150, loss:ls, learning_rate:0.70, r2: 0.6837
n_estimators:150, loss:ls, learning_rate:0.80, r2: 0.7751
n_estimators:150, loss:ls, learning_rate:0.90, r2: 0.7218
n_estimators:150, loss:ls, learning_rate:1.00, r2: 0.5553
n_estimators:150, loss:lad, learning_rate:0.10, r2: 0.7709
n_estimators:150, loss:lad, learning_rate:0.20, r2: 0.8230
n_estimators:150, loss:lad, learning_rate:0.30, r2: 0.7806
n_estimators:150, loss:lad, learning_rate:0.40, r2: 0.8013
n_estimators:150, loss:lad, learning_rate:0.50, r2: 0.8267
n_estimators:150, loss:lad, learning_rate:0.60, r2: 0.7934
n_estimators:150, loss:lad, learning_rate:0.70, r2: 0.7262
n_estimators:150, loss:lad, learning_rate:0.80, r2: 0.7352
n_estimators:150, loss:lad, learning_rate:0.90, r2: 0.6833
n_estimators:150, loss:lad, learning_rate:1.00, r2: 0.6865
n_estimators:150, loss:huber, learning_rate:0.10, r2: 0.8180
n_estimators:150, loss:huber, learning_rate:0.20, r2: 0.8178
n_estimators:150, loss:huber, learning_rate:0.30, r2: 0.8259
n_estimators:150, loss:huber, learning_rate:0.40, r2: 0.8594
n_estimators:150, loss:huber, learning_rate:0.50, r2: 0.8678
n_estimators:150, loss:huber, learning_rate:0.60, r2: 0.7979
n_estimators:150, loss:huber, learning_rate:0.70, r2: 0.8309
n_estimators:150, loss:huber, learning_rate:0.80, r2: 0.6528
n_estimators:150, loss:huber, learning_rate:0.90, r2: 0.7284
n_estimators:150, loss:huber, learning_rate:1.00, r2: 0.8012
n_estimators:150, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:150, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:150, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:150, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:150, loss:quantile, learning_rate:0.50, r2: 0.7045
n_estimators:150, loss:quantile, learning_rate:0.60, r2: 0.7055
n_estimators:150, loss:quantile, learning_rate:0.70, r2: 0.6085
n_estimators:150, loss:quantile, learning_rate:0.80, r2: 0.6644
n_estimators:150, loss:quantile, learning_rate:0.90, r2: 0.5798
n_estimators:150, loss:quantile, learning_rate:1.00, r2: 0.4027
n_estimators:200, loss:ls, learning_rate:0.10, r2: 0.7938
n_estimators:200, loss:ls, learning_rate:0.20, r2: 0.8455
n_estimators:200, loss:ls, learning_rate:0.30, r2: 0.8353
n_estimators:200, loss:ls, learning_rate:0.40, r2: 0.8032
n_estimators:200, loss:ls, learning_rate:0.50, r2: 0.7374
n_estimators:200, loss:ls, learning_rate:0.60, r2: 0.7706
n_estimators:200, loss:ls, learning_rate:0.70, r2: 0.6836
n_estimators:200, loss:ls, learning_rate:0.80, r2: 0.7729
n_estimators:200, loss:ls, learning_rate:0.90, r2: 0.7227
n_estimators:200, loss:ls, learning_rate:1.00, r2: 0.5530
n_estimators:200, loss:lad, learning_rate:0.10, r2: 0.7814
n_estimators:200, loss:lad, learning_rate:0.20, r2: 0.8274
n_estimators:200, loss:lad, learning_rate:0.30, r2: 0.7815
n_estimators:200, loss:lad, learning_rate:0.40, r2: 0.8074
n_estimators:200, loss:lad, learning_rate:0.50, r2: 0.8266
n_estimators:200, loss:lad, learning_rate:0.60, r2: 0.7935
n_estimators:200, loss:lad, learning_rate:0.70, r2: 0.7272
n_estimators:200, loss:lad, learning_rate:0.80, r2: 0.7375
n_estimators:200, loss:lad, learning_rate:0.90, r2: 0.6825
n_estimators:200, loss:lad, learning_rate:1.00, r2: 0.6801
n_estimators:200, loss:huber, learning_rate:0.10, r2: 0.8238
n_estimators:200, loss:huber, learning_rate:0.20, r2: 0.8288
n_estimators:200, loss:huber, learning_rate:0.30, r2: 0.8264
n_estimators:200, loss:huber, learning_rate:0.40, r2: 0.8600
n_estimators:200, loss:huber, learning_rate:0.50, r2: 0.8675
n_estimators:200, loss:huber, learning_rate:0.60, r2: 0.8028
n_estimators:200, loss:huber, learning_rate:0.70, r2: 0.8301
n_estimators:200, loss:huber, learning_rate:0.80, r2: 0.6503
n_estimators:200, loss:huber, learning_rate:0.90, r2: 0.7346
n_estimators:200, loss:huber, learning_rate:1.00, r2: 0.8009
n_estimators:200, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:200, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:200, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:200, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:200, loss:quantile, learning_rate:0.50, r2: 0.7064
n_estimators:200, loss:quantile, learning_rate:0.60, r2: 0.7125
n_estimators:200, loss:quantile, learning_rate:0.70, r2: 0.6111
n_estimators:200, loss:quantile, learning_rate:0.80, r2: 0.6794
n_estimators:200, loss:quantile, learning_rate:0.90, r2: 0.5594
n_estimators:200, loss:quantile, learning_rate:1.00, r2: 0.3322
n_estimators:500, loss:ls, learning_rate:0.10, r2: 0.7954
n_estimators:500, loss:ls, learning_rate:0.20, r2: 0.8494
n_estimators:500, loss:ls, learning_rate:0.30, r2: 0.8365
n_estimators:500, loss:ls, learning_rate:0.40, r2: 0.8047
n_estimators:500, loss:ls, learning_rate:0.50, r2: 0.7365
n_estimators:500, loss:ls, learning_rate:0.60, r2: 0.7696
n_estimators:500, loss:ls, learning_rate:0.70, r2: 0.6831
n_estimators:500, loss:ls, learning_rate:0.80, r2: 0.7749
n_estimators:500, loss:ls, learning_rate:0.90, r2: 0.7229
n_estimators:500, loss:ls, learning_rate:1.00, r2: 0.5536
n_estimators:500, loss:lad, learning_rate:0.10, r2: 0.8136
n_estimators:500, loss:lad, learning_rate:0.20, r2: 0.8470
n_estimators:500, loss:lad, learning_rate:0.30, r2: 0.7917
n_estimators:500, loss:lad, learning_rate:0.40, r2: 0.8093
n_estimators:500, loss:lad, learning_rate:0.50, r2: 0.8248
n_estimators:500, loss:lad, learning_rate:0.60, r2: 0.7984
n_estimators:500, loss:lad, learning_rate:0.70, r2: 0.7235
n_estimators:500, loss:lad, learning_rate:0.80, r2: 0.7537
n_estimators:500, loss:lad, learning_rate:0.90, r2: 0.6900
n_estimators:500, loss:lad, learning_rate:1.00, r2: 0.6772
n_estimators:500, loss:huber, learning_rate:0.10, r2: 0.8365
n_estimators:500, loss:huber, learning_rate:0.20, r2: 0.8289
n_estimators:500, loss:huber, learning_rate:0.30, r2: 0.8255
n_estimators:500, loss:huber, learning_rate:0.40, r2: 0.8618
n_estimators:500, loss:huber, learning_rate:0.50, r2: 0.8669
n_estimators:500, loss:huber, learning_rate:0.60, r2: 0.8046
n_estimators:500, loss:huber, learning_rate:0.70, r2: 0.8324
n_estimators:500, loss:huber, learning_rate:0.80, r2: 0.6512
n_estimators:500, loss:huber, learning_rate:0.90, r2: 0.7330
n_estimators:500, loss:huber, learning_rate:1.00, r2: 0.8048
n_estimators:500, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:500, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:500, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:500, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:500, loss:quantile, learning_rate:0.50, r2: 0.7166
n_estimators:500, loss:quantile, learning_rate:0.60, r2: 0.7160
n_estimators:500, loss:quantile, learning_rate:0.70, r2: 0.6256
n_estimators:500, loss:quantile, learning_rate:0.80, r2: 0.6652
n_estimators:500, loss:quantile, learning_rate:0.90, r2: 0.5556
n_estimators:500, loss:quantile, learning_rate:1.00, r2: 0.3129
由上述结果可知,GBDT在n_estimators=100, loss=huber, learning_rate=0.50时效果最佳,此时r2=0.8687。
从上述四种模型的最优参数模型的回归效果可知效果差别不大,下面从运行时间角度,对四种模型进行对比:
%%time
# bagging回归
bagr = BaggingRegressor(base_estimator=RI, n_estimators=20, bootstrap_features=False, random_state=10)
bagr.fit(X_train,y_train)
print("r2:%0.4f" %(bagr.score(X_test,y_test)))
r2:0.8803
Wall time: 199 ms
%%time
# 随机森林
RF = RandomForestRegressor(n_estimators=20, criterion="mse", max_features="auto",random_state=10)
RF.fit(X_train, y_train)
print("r2: %0.4f" %(RF.score(X_test,y_test)))
r2: 0.8242
Wall time: 380 ms
%%time
# AdaBoost
ABR = AdaBoostRegressor(base_estimator=DTR, random_state=10,n_estimators = 20, loss = "square", learning_rate =0.3)
ABR.fit(X_train, y_train)
print("r2: %0.4f" %(ABR.score(X_test,y_test)))
r2: 0.8476
Wall time: 582 ms
%%time
# GBDT
GBR = GradientBoostingRegressor(loss="huber", learning_rate=0.5, n_estimators=100,random_state=10)
GBR.fit(X_train, y_train)
print("r2: %0.4f" %(GBR.score(X_test,y_test)))
r2: 0.8687
Wall time: 758 ms
从r2指标和运行时间角度综合考虑,采用以岭回归为基学习器的bagging回归模型,在n_estimators = 20,bootstrap_features=False时,效果最佳。