Price_prediction

赖运珧
2023-12-01

问题描述

采用集成学习的方法对美国爱荷华州埃姆斯地区的房价进行预测
数据集(train)中有1460个样本,81个特征,目标特征为房价

解决过程

导入数据

import pandas as pd

train = pd.read_csv(r"C:\Users\SZS-Student\Desktop\机器学习与Python实践\train-2.csv")
train.head()
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal181500
2360RL68.011250PaveNaNIR1LvlAllPub...0NaNNaNNaN092008WDNormal223500
3470RL60.09550PaveNaNIR1LvlAllPub...0NaNNaNNaN022006WDAbnorml140000
4560RL84.014260PaveNaNIR1LvlAllPub...0NaNNaNNaN0122008WDNormal250000

5 rows × 81 columns

train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     1452 non-null   object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
 80  SalePrice      1460 non-null   int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB

去除无用数据特征Id和缺失值较多的特征

由上述结果可知,上述数据中"Alley",“PoolQC”,“Fence”,“MiscFeature"4个特征的缺失值较多,而且数据集中特征较多,因此去除这四个特征和无用特征"Id”

train = train.drop(["Id","Alley","PoolQC","Fence","MiscFeature"],axis = 1)
train.head()
MSSubClassMSZoningLotFrontageLotAreaStreetLotShapeLandContourUtilitiesLotConfigLandSlope...EnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
060RL65.08450PaveRegLvlAllPubInsideGtl...0000022008WDNormal208500
120RL80.09600PaveRegLvlAllPubFR2Gtl...0000052007WDNormal181500
260RL68.011250PaveIR1LvlAllPubInsideGtl...0000092008WDNormal223500
370RL60.09550PaveIR1LvlAllPubCornerGtl...272000022006WDAbnorml140000
460RL84.014260PaveIR1LvlAllPubFR2Gtl...00000122008WDNormal250000

5 rows × 76 columns

获取存在缺失值的特征

# 含缺失值的特征及缺失值的个数
Missing_value_list  = {}
for i in train.columns:
    null_count = train[i].isnull().sum()
    if null_count > 0:
        Missing_value_list[i] = null_count
Missing_value_list
{'LotFrontage': 259,
 'MasVnrType': 8,
 'MasVnrArea': 8,
 'BsmtQual': 37,
 'BsmtCond': 37,
 'BsmtExposure': 38,
 'BsmtFinType1': 37,
 'BsmtFinType2': 38,
 'Electrical': 1,
 'FireplaceQu': 690,
 'GarageType': 81,
 'GarageYrBlt': 81,
 'GarageFinish': 81,
 'GarageQual': 81,
 'GarageCond': 81}

缺失值处理

离散型特征使用众数进行插补;连续型特征使用均值进行插补

离散类型特征

MasVnrType,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinType2,Electrical,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageQual,GarageCond

连续类型特征

LotFrontage,MasVnrArea

from sklearn.impute import SimpleImputer
import numpy as np
MISSfeature_Discrete_list = ["MasVnrType","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2","Electrical","FireplaceQu","GarageType","GarageYrBlt","GarageFinish","GarageQual","GarageCond"]
MISSfeature_Continuous_list = ["LotFrontage","MasVnrArea"]

# 对离散类型特征进行缺失值处理
# 用众数填补
imputer = SimpleImputer(missing_values=np.nan ,strategy="most_frequent")
for i in MISSfeature_Discrete_list:
    imputer.fit(train[[i]])
    train[i] = imputer.transform(train[[i]])
    print(train[i].isnull().sum()) # 检擦是否还存在缺失值
    
# 对连续类型特征进行缺失值处理
# 用均值填补
imputer = SimpleImputer(missing_values=np.nan ,strategy="mean")
for i in MISSfeature_Continuous_list:
    imputer.fit(train[[i]])
    train[i] = imputer.transform(train[[i]])
    print(train[i].isnull().sum()) # 检擦是否还存在缺失值
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
#特征列表
feature_list = list(train.columns)
#连续类型特征列表
feature_Continuous = ["LotFrontage","LotArea","YearBuilt","YearRemodAdd","MasVnrArea","BsmtFinSF1","BsmtFinSF2","BsmtUnfSF","TotalBsmtSF",\
                    "1stFlrSF","2ndFlrSF","LowQualFinSF","GrLivArea","GarageYrBlt","GarageArea","WoodDeckSF","OpenPorchSF","EnclosedPorch",\
                    "3SsnPorch","ScreenPorch","PoolArea","MiscVal","MoSold","YrSold"]
#离散类型特征列表
for i in feature_Continuous:
    feature_list.remove(i)  
    
#离散类型特征列表    
feature_Discrete = feature_list

print(feature_Continuous)
print(feature_Discrete)
['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'GarageYrBlt', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']
['MSSubClass', 'MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageCars', 'GarageQual', 'GarageCond', 'PavedDrive', 'SaleType', 'SaleCondition', 'SalePrice']

特征编码和标准化

对离散类型特征进行one-hot编码;对连续类型特征进行Z-score标准化

# Z-score标准化
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler(copy=True)
train_scaled = train
for i in feature_Continuous:
    train_scaled[i] = scaler.fit_transform(train[[i]])
train_scaled.head()
MSSubClassMSZoningLotFrontageLotAreaStreetLotShapeLandContourUtilitiesLotConfigLandSlope...EnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
060RL-0.229372-0.207142PaveRegLvlAllPubInsideGtl...-0.359325-0.116339-0.270208-0.068692-0.087688-1.5991110.138777WDNormal208500
120RL0.451936-0.091886PaveRegLvlAllPubFR2Gtl...-0.359325-0.116339-0.270208-0.068692-0.087688-0.489110-0.614439WDNormal181500
260RL-0.0931100.073480PaveIR1LvlAllPubInsideGtl...-0.359325-0.116339-0.270208-0.068692-0.0876880.9908910.138777WDNormal223500
370RL-0.456474-0.096897PaveIR1LvlAllPubCornerGtl...4.092524-0.116339-0.270208-0.068692-0.087688-1.599111-1.367655WDAbnorml140000
460RL0.6336180.375148PaveIR1LvlAllPubFR2Gtl...-0.359325-0.116339-0.270208-0.068692-0.0876882.1008920.138777WDNormal250000

5 rows × 76 columns

# 去掉SalePrice
feature_Discrete.remove('SalePrice')

# one-hot编码
train_scaled = pd.get_dummies(train_scaled, columns=feature_Discrete)
train_scaled.head()
LotFrontageLotAreaYearBuiltYearRemodAddMasVnrAreaBsmtFinSF1BsmtFinSF2BsmtUnfSFTotalBsmtSF1stFlrSF...SaleType_ConLwSaleType_NewSaleType_OthSaleType_WDSaleCondition_AbnormlSaleCondition_AdjLandSaleCondition_AllocaSaleCondition_FamilySaleCondition_NormalSaleCondition_Partial
0-0.229372-0.2071421.0509940.8786680.5114180.575425-0.288653-0.944591-0.459303-0.793434...0001000010
10.451936-0.0918860.156734-0.429577-0.5744101.171992-0.288653-0.6412280.4664650.257140...0001000010
2-0.0931100.0734800.9847520.8302150.3230600.092907-0.288653-0.301643-0.313369-0.627826...0001000010
3-0.456474-0.096897-1.863632-0.720298-0.574410-0.499274-0.288653-0.061670-0.687324-0.521734...0001100000
40.6336180.3751480.9516320.7333081.3645700.463568-0.288653-0.1748650.199680-0.045611...0001000010

5 rows × 345 columns

划分训练集和测试集

#将标签与特征分离
X = train_scaled.drop('SalePrice', axis=1)
y = train_scaled['SalePrice']

#划分训练集与测试集,训练集80%,测试集20%
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=255)

bagging回归

from sklearn.ensemble import BaggingRegressor

# 岭回归
from sklearn.linear_model import Ridge
RI = Ridge()
# KN近邻
from sklearn.neighbors import KNeighborsRegressor
KNR = KNeighborsRegressor()
# 决策树
from sklearn.tree import DecisionTreeRegressor
DTR = DecisionTreeRegressor(random_state=10)

estimator_list  = [RI, KNR, DTR]
grid_n = [10, 20, 50, 100, 150, 200, 500]
grid_fea = [True, False]

# bagging回归参数调优
for i in grid_n:
    for j in grid_fea:
            bagr = BaggingRegressor(base_estimator=DTR, n_estimators=i, bootstrap_features=j, random_state=10)
            bagr.fit(X_train,y_train)
            print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.7840
10,False,r2:0.7852
20,True,r2:0.7949
20,False,r2:0.8139
50,True,r2:0.8038
50,False,r2:0.8190
100,True,r2:0.8129
100,False,r2:0.8170
150,True,r2:0.8114
150,False,r2:0.8155
200,True,r2:0.8157
200,False,r2:0.8167
500,True,r2:0.8179
500,False,r2:0.8142

由此可知使用bagging回归模型时,在n_estimators = 50,bootstrap_features=False时,效果最佳,此时的基学习器为决策树,r2 = 0.819;
下面讨论基学习器为KN近邻回归和岭回归时的拟合效果:

# 基学习器为KNR
for i in grid_n:
    for j in grid_fea:
            bagr = BaggingRegressor(base_estimator=KNR, n_estimators=i, bootstrap_features=j, random_state=10)
            bagr.fit(X_train,y_train)
            print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.7347
10,False,r2:0.7245
20,True,r2:0.7282
20,False,r2:0.7271
50,True,r2:0.7346
50,False,r2:0.7264
100,True,r2:0.7386
100,False,r2:0.7233
150,True,r2:0.7362
150,False,r2:0.7241
200,True,r2:0.7356
200,False,r2:0.7231
500,True,r2:0.7375
500,False,r2:0.7250

通过运行结果可知,基学习器为KN近邻回归模型时,整体的拟合效果低于基学习器为决策树时的效果;

# 基学习器为RI
for i in grid_n:
    for j in grid_fea:
            bagr = BaggingRegressor(base_estimator=RI, n_estimators=i, bootstrap_features=j, random_state=10)
            bagr.fit(X_train,y_train)
            print("%d,%s,r2:%0.4f" %(i,j,bagr.score(X_test,y_test)))
10,True,r2:0.8758
10,False,r2:0.8701
20,True,r2:0.8790
20,False,r2:0.8803
50,True,r2:0.8737
50,False,r2:0.8752
100,True,r2:0.8724
100,False,r2:0.8711
150,True,r2:0.8718
150,False,r2:0.8717
200,True,r2:0.8727
200,False,r2:0.8735
500,True,r2:0.8732
500,False,r2:0.8721

与上面两种基学习器相比,可以明显看出,基学习器为岭回归模型时效果最佳;在n_estimators = 20,bootstrap_features=False时,效果最佳,此时r2 = 0.8803。

随机森林

from sklearn.ensemble import RandomForestRegressor

# 随机森林参数调优 
criterion_list = ["mse", "mae"]
max_features_list =["auto", "sqrt", "log2"]

for i in grid_n:
    for j in criterion_list:
        for k in max_features_list:
            RF = RandomForestRegressor(n_estimators=i, criterion=j,max_features=k,random_state=10)
            RF.fit(X_train, y_train)
            print("%d,%s,%s,r2: %0.4f" %(i,j,k,RF.score(X_test,y_test)))
10,mse,auto,r2: 0.8125
10,mse,sqrt,r2: 0.7479
10,mse,log2,r2: 0.7499
10,mae,auto,r2: 0.7934
10,mae,sqrt,r2: 0.7406
10,mae,log2,r2: 0.7283
20,mse,auto,r2: 0.8242
20,mse,sqrt,r2: 0.7864
20,mse,log2,r2: 0.7498
20,mae,auto,r2: 0.8187
20,mae,sqrt,r2: 0.7566
20,mae,log2,r2: 0.7437
50,mse,auto,r2: 0.8173
50,mse,sqrt,r2: 0.7923
50,mse,log2,r2: 0.7518
50,mae,auto,r2: 0.8122
50,mae,sqrt,r2: 0.7752
50,mae,log2,r2: 0.7468
100,mse,auto,r2: 0.8187
100,mse,sqrt,r2: 0.7912
100,mse,log2,r2: 0.7477
100,mae,auto,r2: 0.8119
100,mae,sqrt,r2: 0.7731
100,mae,log2,r2: 0.7461
150,mse,auto,r2: 0.8199
150,mse,sqrt,r2: 0.7834
150,mse,log2,r2: 0.7513
150,mae,auto,r2: 0.8111
150,mae,sqrt,r2: 0.7725
150,mae,log2,r2: 0.7463
200,mse,auto,r2: 0.8178
200,mse,sqrt,r2: 0.7817
200,mse,log2,r2: 0.7495
200,mae,auto,r2: 0.8060
200,mae,sqrt,r2: 0.7742
200,mae,log2,r2: 0.7491
500,mse,auto,r2: 0.8170
500,mse,sqrt,r2: 0.7826
500,mse,log2,r2: 0.7454
500,mae,auto,r2: 0.8042
500,mae,sqrt,r2: 0.7748
500,mae,log2,r2: 0.7467

由上述结果可知,随机森林在n_estimators=20,criterion=mse,max_features=auto时效果最佳,此时r2= 0.8242。

AdaBoost

from sklearn.ensemble import AdaBoostRegressor
#在基学习器个数为默认50,学习率默认为1的情况下,观测岭回归,KN近邻,决策树三个基学习器的拟合拟合效果
for i in estimator_list:
    ABR =  AdaBoostRegressor(base_estimator=i, random_state=10)
    ABR.fit(X_train, y_train)
    print("%s,r2: %0.4f" %(i, ABR.score(X_test,y_test)))
Ridge(),r2: 0.7018
KNeighborsRegressor(),r2: 0.6619
DecisionTreeRegressor(random_state=10),r2: 0.8438

从上面的结果可以看出,决策树为基学习器时,AdaBoost回归模型的效果最佳;下面以决策树为基学习器,对n_estimators,loss, learning_rate超参数进行调优

loss_list = ['linear', 'square', 'exponential']
random_state = np.arange(0.1,1.1,0.1)

for i in grid_n:
    for j in loss_list:
        for k in random_state:
                ABR =  AdaBoostRegressor(base_estimator=DTR, random_state=10,n_estimators = i, loss = j, learning_rate =k )
                ABR.fit(X_train, y_train)
                print("n_estimators:%d, loss:%s, learning_rate:%f, r2: %0.4f" %(i,j,k, ABR.score(X_test,y_test)))
n_estimators:10, loss:linear, learning_rate:0.100000, r2: 0.7986
n_estimators:10, loss:linear, learning_rate:0.200000, r2: 0.8138
n_estimators:10, loss:linear, learning_rate:0.300000, r2: 0.7897
n_estimators:10, loss:linear, learning_rate:0.400000, r2: 0.7993
n_estimators:10, loss:linear, learning_rate:0.500000, r2: 0.8016
n_estimators:10, loss:linear, learning_rate:0.600000, r2: 0.8145
n_estimators:10, loss:linear, learning_rate:0.700000, r2: 0.8190
n_estimators:10, loss:linear, learning_rate:0.800000, r2: 0.8338
n_estimators:10, loss:linear, learning_rate:0.900000, r2: 0.8147
n_estimators:10, loss:linear, learning_rate:1.000000, r2: 0.8363
n_estimators:10, loss:square, learning_rate:0.100000, r2: 0.8443
n_estimators:10, loss:square, learning_rate:0.200000, r2: 0.7998
n_estimators:10, loss:square, learning_rate:0.300000, r2: 0.7877
n_estimators:10, loss:square, learning_rate:0.400000, r2: 0.8167
n_estimators:10, loss:square, learning_rate:0.500000, r2: 0.8348
n_estimators:10, loss:square, learning_rate:0.600000, r2: 0.8096
n_estimators:10, loss:square, learning_rate:0.700000, r2: 0.8165
n_estimators:10, loss:square, learning_rate:0.800000, r2: 0.8002
n_estimators:10, loss:square, learning_rate:0.900000, r2: 0.8050
n_estimators:10, loss:square, learning_rate:1.000000, r2: 0.8323
n_estimators:10, loss:exponential, learning_rate:0.100000, r2: 0.8459
n_estimators:10, loss:exponential, learning_rate:0.200000, r2: 0.8153
n_estimators:10, loss:exponential, learning_rate:0.300000, r2: 0.8226
n_estimators:10, loss:exponential, learning_rate:0.400000, r2: 0.8005
n_estimators:10, loss:exponential, learning_rate:0.500000, r2: 0.7978
n_estimators:10, loss:exponential, learning_rate:0.600000, r2: 0.8466
n_estimators:10, loss:exponential, learning_rate:0.700000, r2: 0.7934
n_estimators:10, loss:exponential, learning_rate:0.800000, r2: 0.8502
n_estimators:10, loss:exponential, learning_rate:0.900000, r2: 0.8190
n_estimators:10, loss:exponential, learning_rate:1.000000, r2: 0.8327
n_estimators:20, loss:linear, learning_rate:0.100000, r2: 0.8007
n_estimators:20, loss:linear, learning_rate:0.200000, r2: 0.8267
n_estimators:20, loss:linear, learning_rate:0.300000, r2: 0.8089
n_estimators:20, loss:linear, learning_rate:0.400000, r2: 0.8231
n_estimators:20, loss:linear, learning_rate:0.500000, r2: 0.8166
n_estimators:20, loss:linear, learning_rate:0.600000, r2: 0.8202
n_estimators:20, loss:linear, learning_rate:0.700000, r2: 0.8381
n_estimators:20, loss:linear, learning_rate:0.800000, r2: 0.8354
n_estimators:20, loss:linear, learning_rate:0.900000, r2: 0.8246
n_estimators:20, loss:linear, learning_rate:1.000000, r2: 0.8385
n_estimators:20, loss:square, learning_rate:0.100000, r2: 0.8158
n_estimators:20, loss:square, learning_rate:0.200000, r2: 0.8226
n_estimators:20, loss:square, learning_rate:0.300000, r2: 0.8476
n_estimators:20, loss:square, learning_rate:0.400000, r2: 0.8136
n_estimators:20, loss:square, learning_rate:0.500000, r2: 0.8375
n_estimators:20, loss:square, learning_rate:0.600000, r2: 0.8140
n_estimators:20, loss:square, learning_rate:0.700000, r2: 0.7956
n_estimators:20, loss:square, learning_rate:0.800000, r2: 0.8172
n_estimators:20, loss:square, learning_rate:0.900000, r2: 0.8247
n_estimators:20, loss:square, learning_rate:1.000000, r2: 0.8166
n_estimators:20, loss:exponential, learning_rate:0.100000, r2: 0.8393
n_estimators:20, loss:exponential, learning_rate:0.200000, r2: 0.8155
n_estimators:20, loss:exponential, learning_rate:0.300000, r2: 0.8439
n_estimators:20, loss:exponential, learning_rate:0.400000, r2: 0.8108
n_estimators:20, loss:exponential, learning_rate:0.500000, r2: 0.8094
n_estimators:20, loss:exponential, learning_rate:0.600000, r2: 0.8170
n_estimators:20, loss:exponential, learning_rate:0.700000, r2: 0.8265
n_estimators:20, loss:exponential, learning_rate:0.800000, r2: 0.8428
n_estimators:20, loss:exponential, learning_rate:0.900000, r2: 0.8395
n_estimators:20, loss:exponential, learning_rate:1.000000, r2: 0.8318
n_estimators:50, loss:linear, learning_rate:0.100000, r2: 0.8240
n_estimators:50, loss:linear, learning_rate:0.200000, r2: 0.8376
n_estimators:50, loss:linear, learning_rate:0.300000, r2: 0.8409
n_estimators:50, loss:linear, learning_rate:0.400000, r2: 0.8404
n_estimators:50, loss:linear, learning_rate:0.500000, r2: 0.8386
n_estimators:50, loss:linear, learning_rate:0.600000, r2: 0.8156
n_estimators:50, loss:linear, learning_rate:0.700000, r2: 0.8287
n_estimators:50, loss:linear, learning_rate:0.800000, r2: 0.8234
n_estimators:50, loss:linear, learning_rate:0.900000, r2: 0.8207
n_estimators:50, loss:linear, learning_rate:1.000000, r2: 0.8438
n_estimators:50, loss:square, learning_rate:0.100000, r2: 0.8208
n_estimators:50, loss:square, learning_rate:0.200000, r2: 0.8429
n_estimators:50, loss:square, learning_rate:0.300000, r2: 0.8424
n_estimators:50, loss:square, learning_rate:0.400000, r2: 0.8274
n_estimators:50, loss:square, learning_rate:0.500000, r2: 0.8382
n_estimators:50, loss:square, learning_rate:0.600000, r2: 0.8182
n_estimators:50, loss:square, learning_rate:0.700000, r2: 0.8189
n_estimators:50, loss:square, learning_rate:0.800000, r2: 0.8206
n_estimators:50, loss:square, learning_rate:0.900000, r2: 0.8338
n_estimators:50, loss:square, learning_rate:1.000000, r2: 0.8210
n_estimators:50, loss:exponential, learning_rate:0.100000, r2: 0.8308
n_estimators:50, loss:exponential, learning_rate:0.200000, r2: 0.8263
n_estimators:50, loss:exponential, learning_rate:0.300000, r2: 0.8393
n_estimators:50, loss:exponential, learning_rate:0.400000, r2: 0.8242
n_estimators:50, loss:exponential, learning_rate:0.500000, r2: 0.8238
n_estimators:50, loss:exponential, learning_rate:0.600000, r2: 0.8297
n_estimators:50, loss:exponential, learning_rate:0.700000, r2: 0.8323
n_estimators:50, loss:exponential, learning_rate:0.800000, r2: 0.8397
n_estimators:50, loss:exponential, learning_rate:0.900000, r2: 0.8164
n_estimators:50, loss:exponential, learning_rate:1.000000, r2: 0.8310
n_estimators:100, loss:linear, learning_rate:0.100000, r2: 0.8288
n_estimators:100, loss:linear, learning_rate:0.200000, r2: 0.8393
n_estimators:100, loss:linear, learning_rate:0.300000, r2: 0.8318
n_estimators:100, loss:linear, learning_rate:0.400000, r2: 0.8360
n_estimators:100, loss:linear, learning_rate:0.500000, r2: 0.8417
n_estimators:100, loss:linear, learning_rate:0.600000, r2: 0.8286
n_estimators:100, loss:linear, learning_rate:0.700000, r2: 0.8283
n_estimators:100, loss:linear, learning_rate:0.800000, r2: 0.8280
n_estimators:100, loss:linear, learning_rate:0.900000, r2: 0.8292
n_estimators:100, loss:linear, learning_rate:1.000000, r2: 0.8415
n_estimators:100, loss:square, learning_rate:0.100000, r2: 0.8329
n_estimators:100, loss:square, learning_rate:0.200000, r2: 0.8331
n_estimators:100, loss:square, learning_rate:0.300000, r2: 0.8443
n_estimators:100, loss:square, learning_rate:0.400000, r2: 0.8334
n_estimators:100, loss:square, learning_rate:0.500000, r2: 0.8434
n_estimators:100, loss:square, learning_rate:0.600000, r2: 0.8286
n_estimators:100, loss:square, learning_rate:0.700000, r2: 0.8191
n_estimators:100, loss:square, learning_rate:0.800000, r2: 0.8155
n_estimators:100, loss:square, learning_rate:0.900000, r2: 0.8352
n_estimators:100, loss:square, learning_rate:1.000000, r2: 0.8295
n_estimators:100, loss:exponential, learning_rate:0.100000, r2: 0.8248
n_estimators:100, loss:exponential, learning_rate:0.200000, r2: 0.8277
n_estimators:100, loss:exponential, learning_rate:0.300000, r2: 0.8386
n_estimators:100, loss:exponential, learning_rate:0.400000, r2: 0.8224
n_estimators:100, loss:exponential, learning_rate:0.500000, r2: 0.8280
n_estimators:100, loss:exponential, learning_rate:0.600000, r2: 0.8256
n_estimators:100, loss:exponential, learning_rate:0.700000, r2: 0.8437
n_estimators:100, loss:exponential, learning_rate:0.800000, r2: 0.8415
n_estimators:100, loss:exponential, learning_rate:0.900000, r2: 0.8234
n_estimators:100, loss:exponential, learning_rate:1.000000, r2: 0.8329
n_estimators:150, loss:linear, learning_rate:0.100000, r2: 0.8282
n_estimators:150, loss:linear, learning_rate:0.200000, r2: 0.8387
n_estimators:150, loss:linear, learning_rate:0.300000, r2: 0.8374
n_estimators:150, loss:linear, learning_rate:0.400000, r2: 0.8409
n_estimators:150, loss:linear, learning_rate:0.500000, r2: 0.8401
n_estimators:150, loss:linear, learning_rate:0.600000, r2: 0.8317
n_estimators:150, loss:linear, learning_rate:0.700000, r2: 0.8302
n_estimators:150, loss:linear, learning_rate:0.800000, r2: 0.8346
n_estimators:150, loss:linear, learning_rate:0.900000, r2: 0.8341
n_estimators:150, loss:linear, learning_rate:1.000000, r2: 0.8403
n_estimators:150, loss:square, learning_rate:0.100000, r2: 0.8413
n_estimators:150, loss:square, learning_rate:0.200000, r2: 0.8341
n_estimators:150, loss:square, learning_rate:0.300000, r2: 0.8449
n_estimators:150, loss:square, learning_rate:0.400000, r2: 0.8291
n_estimators:150, loss:square, learning_rate:0.500000, r2: 0.8429
n_estimators:150, loss:square, learning_rate:0.600000, r2: 0.8338
n_estimators:150, loss:square, learning_rate:0.700000, r2: 0.8173
n_estimators:150, loss:square, learning_rate:0.800000, r2: 0.8238
n_estimators:150, loss:square, learning_rate:0.900000, r2: 0.8365
n_estimators:150, loss:square, learning_rate:1.000000, r2: 0.8316
n_estimators:150, loss:exponential, learning_rate:0.100000, r2: 0.8278
n_estimators:150, loss:exponential, learning_rate:0.200000, r2: 0.8303
n_estimators:150, loss:exponential, learning_rate:0.300000, r2: 0.8399
n_estimators:150, loss:exponential, learning_rate:0.400000, r2: 0.8220
n_estimators:150, loss:exponential, learning_rate:0.500000, r2: 0.8353
n_estimators:150, loss:exponential, learning_rate:0.600000, r2: 0.8372
n_estimators:150, loss:exponential, learning_rate:0.700000, r2: 0.8419
n_estimators:150, loss:exponential, learning_rate:0.800000, r2: 0.8349
n_estimators:150, loss:exponential, learning_rate:0.900000, r2: 0.8329
n_estimators:150, loss:exponential, learning_rate:1.000000, r2: 0.8295
n_estimators:200, loss:linear, learning_rate:0.100000, r2: 0.8316
n_estimators:200, loss:linear, learning_rate:0.200000, r2: 0.8390
n_estimators:200, loss:linear, learning_rate:0.300000, r2: 0.8391
n_estimators:200, loss:linear, learning_rate:0.400000, r2: 0.8375
n_estimators:200, loss:linear, learning_rate:0.500000, r2: 0.8396
n_estimators:200, loss:linear, learning_rate:0.600000, r2: 0.8315
n_estimators:200, loss:linear, learning_rate:0.700000, r2: 0.8354
n_estimators:200, loss:linear, learning_rate:0.800000, r2: 0.8311
n_estimators:200, loss:linear, learning_rate:0.900000, r2: 0.8299
n_estimators:200, loss:linear, learning_rate:1.000000, r2: 0.8408
n_estimators:200, loss:square, learning_rate:0.100000, r2: 0.8384
n_estimators:200, loss:square, learning_rate:0.200000, r2: 0.8366
n_estimators:200, loss:square, learning_rate:0.300000, r2: 0.8421
n_estimators:200, loss:square, learning_rate:0.400000, r2: 0.8293
n_estimators:200, loss:square, learning_rate:0.500000, r2: 0.8380
n_estimators:200, loss:square, learning_rate:0.600000, r2: 0.8337
n_estimators:200, loss:square, learning_rate:0.700000, r2: 0.8229
n_estimators:200, loss:square, learning_rate:0.800000, r2: 0.8259
n_estimators:200, loss:square, learning_rate:0.900000, r2: 0.8353
n_estimators:200, loss:square, learning_rate:1.000000, r2: 0.8333
n_estimators:200, loss:exponential, learning_rate:0.100000, r2: 0.8366
n_estimators:200, loss:exponential, learning_rate:0.200000, r2: 0.8288
n_estimators:200, loss:exponential, learning_rate:0.300000, r2: 0.8396
n_estimators:200, loss:exponential, learning_rate:0.400000, r2: 0.8261
n_estimators:200, loss:exponential, learning_rate:0.500000, r2: 0.8330
n_estimators:200, loss:exponential, learning_rate:0.600000, r2: 0.8274
n_estimators:200, loss:exponential, learning_rate:0.700000, r2: 0.8409
n_estimators:200, loss:exponential, learning_rate:0.800000, r2: 0.8331
n_estimators:200, loss:exponential, learning_rate:0.900000, r2: 0.8292
n_estimators:200, loss:exponential, learning_rate:1.000000, r2: 0.8309
n_estimators:500, loss:linear, learning_rate:0.100000, r2: 0.8353
n_estimators:500, loss:linear, learning_rate:0.200000, r2: 0.8389
n_estimators:500, loss:linear, learning_rate:0.300000, r2: 0.8325
n_estimators:500, loss:linear, learning_rate:0.400000, r2: 0.8413
n_estimators:500, loss:linear, learning_rate:0.500000, r2: 0.8411
n_estimators:500, loss:linear, learning_rate:0.600000, r2: 0.8374
n_estimators:500, loss:linear, learning_rate:0.700000, r2: 0.8365
n_estimators:500, loss:linear, learning_rate:0.800000, r2: 0.8342
n_estimators:500, loss:linear, learning_rate:0.900000, r2: 0.8306
n_estimators:500, loss:linear, learning_rate:1.000000, r2: 0.8367
n_estimators:500, loss:square, learning_rate:0.100000, r2: 0.8364
n_estimators:500, loss:square, learning_rate:0.200000, r2: 0.8361
n_estimators:500, loss:square, learning_rate:0.300000, r2: 0.8441
n_estimators:500, loss:square, learning_rate:0.400000, r2: 0.8321
n_estimators:500, loss:square, learning_rate:0.500000, r2: 0.8325
n_estimators:500, loss:square, learning_rate:0.600000, r2: 0.8399
n_estimators:500, loss:square, learning_rate:0.700000, r2: 0.8320
n_estimators:500, loss:square, learning_rate:0.800000, r2: 0.8312
n_estimators:500, loss:square, learning_rate:0.900000, r2: 0.8285
n_estimators:500, loss:square, learning_rate:1.000000, r2: 0.8270
n_estimators:500, loss:exponential, learning_rate:0.100000, r2: 0.8407
n_estimators:500, loss:exponential, learning_rate:0.200000, r2: 0.8306
n_estimators:500, loss:exponential, learning_rate:0.300000, r2: 0.8349
n_estimators:500, loss:exponential, learning_rate:0.400000, r2: 0.8300
n_estimators:500, loss:exponential, learning_rate:0.500000, r2: 0.8359
n_estimators:500, loss:exponential, learning_rate:0.600000, r2: 0.8338
n_estimators:500, loss:exponential, learning_rate:0.700000, r2: 0.8376
n_estimators:500, loss:exponential, learning_rate:0.800000, r2: 0.8310
n_estimators:500, loss:exponential, learning_rate:0.900000, r2: 0.8294
n_estimators:500, loss:exponential, learning_rate:1.000000, r2: 0.8308

由上示结果知,AdaBoost以决策树为基学习器在n_estimators=20, loss=square, learning_rate=0.300000, 有最佳拟合效果,此时r2=0.8476。

GBDT

from sklearn.ensemble import GradientBoostingRegressor
loss_list = ['ls', 'lad', 'huber', 'quantile']
learning_rate_list = np.arange(0.1,1.1,0.1)

for i in grid_n:
    for j in loss_list:
        for k in learning_rate_list:
            GBR = GradientBoostingRegressor(loss=j, learning_rate=k, n_estimators=i,random_state=10)
            GBR.fit(X_train, y_train)
            print("n_estimators:%d, loss:%s, learning_rate:%0.2f, r2: %0.4f" %(i,j,k, GBR.score(X_test,y_test)))
n_estimators:10, loss:ls, learning_rate:0.10, r2: 0.5922
n_estimators:10, loss:ls, learning_rate:0.20, r2: 0.7456
n_estimators:10, loss:ls, learning_rate:0.30, r2: 0.7815
n_estimators:10, loss:ls, learning_rate:0.40, r2: 0.7385
n_estimators:10, loss:ls, learning_rate:0.50, r2: 0.7016
n_estimators:10, loss:ls, learning_rate:0.60, r2: 0.7388
n_estimators:10, loss:ls, learning_rate:0.70, r2: 0.6672
n_estimators:10, loss:ls, learning_rate:0.80, r2: 0.7304
n_estimators:10, loss:ls, learning_rate:0.90, r2: 0.7114
n_estimators:10, loss:ls, learning_rate:1.00, r2: 0.5668
n_estimators:10, loss:lad, learning_rate:0.10, r2: 0.3481
n_estimators:10, loss:lad, learning_rate:0.20, r2: 0.5562
n_estimators:10, loss:lad, learning_rate:0.30, r2: 0.6521
n_estimators:10, loss:lad, learning_rate:0.40, r2: 0.6943
n_estimators:10, loss:lad, learning_rate:0.50, r2: 0.7284
n_estimators:10, loss:lad, learning_rate:0.60, r2: 0.7315
n_estimators:10, loss:lad, learning_rate:0.70, r2: 0.7139
n_estimators:10, loss:lad, learning_rate:0.80, r2: 0.7172
n_estimators:10, loss:lad, learning_rate:0.90, r2: 0.6893
n_estimators:10, loss:lad, learning_rate:1.00, r2: 0.6687
n_estimators:10, loss:huber, learning_rate:0.10, r2: 0.4641
n_estimators:10, loss:huber, learning_rate:0.20, r2: 0.6158
n_estimators:10, loss:huber, learning_rate:0.30, r2: 0.7108
n_estimators:10, loss:huber, learning_rate:0.40, r2: 0.7433
n_estimators:10, loss:huber, learning_rate:0.50, r2: 0.7594
n_estimators:10, loss:huber, learning_rate:0.60, r2: 0.7780
n_estimators:10, loss:huber, learning_rate:0.70, r2: 0.7680
n_estimators:10, loss:huber, learning_rate:0.80, r2: 0.7657
n_estimators:10, loss:huber, learning_rate:0.90, r2: 0.7181
n_estimators:10, loss:huber, learning_rate:1.00, r2: 0.7399
n_estimators:10, loss:quantile, learning_rate:0.10, r2: -0.0321
n_estimators:10, loss:quantile, learning_rate:0.20, r2: 0.4116
n_estimators:10, loss:quantile, learning_rate:0.30, r2: 0.5778
n_estimators:10, loss:quantile, learning_rate:0.40, r2: 0.6326
n_estimators:10, loss:quantile, learning_rate:0.50, r2: 0.6443
n_estimators:10, loss:quantile, learning_rate:0.60, r2: 0.6106
n_estimators:10, loss:quantile, learning_rate:0.70, r2: 0.5563
n_estimators:10, loss:quantile, learning_rate:0.80, r2: 0.6072
n_estimators:10, loss:quantile, learning_rate:0.90, r2: 0.5383
n_estimators:10, loss:quantile, learning_rate:1.00, r2: 0.3802
n_estimators:20, loss:ls, learning_rate:0.10, r2: 0.7016
n_estimators:20, loss:ls, learning_rate:0.20, r2: 0.7950
n_estimators:20, loss:ls, learning_rate:0.30, r2: 0.8140
n_estimators:20, loss:ls, learning_rate:0.40, r2: 0.7735
n_estimators:20, loss:ls, learning_rate:0.50, r2: 0.7282
n_estimators:20, loss:ls, learning_rate:0.60, r2: 0.7545
n_estimators:20, loss:ls, learning_rate:0.70, r2: 0.6554
n_estimators:20, loss:ls, learning_rate:0.80, r2: 0.7442
n_estimators:20, loss:ls, learning_rate:0.90, r2: 0.7313
n_estimators:20, loss:ls, learning_rate:1.00, r2: 0.5410
n_estimators:20, loss:lad, learning_rate:0.10, r2: 0.5256
n_estimators:20, loss:lad, learning_rate:0.20, r2: 0.6928
n_estimators:20, loss:lad, learning_rate:0.30, r2: 0.7250
n_estimators:20, loss:lad, learning_rate:0.40, r2: 0.7417
n_estimators:20, loss:lad, learning_rate:0.50, r2: 0.7658
n_estimators:20, loss:lad, learning_rate:0.60, r2: 0.7708
n_estimators:20, loss:lad, learning_rate:0.70, r2: 0.7294
n_estimators:20, loss:lad, learning_rate:0.80, r2: 0.7480
n_estimators:20, loss:lad, learning_rate:0.90, r2: 0.7155
n_estimators:20, loss:lad, learning_rate:1.00, r2: 0.6645
n_estimators:20, loss:huber, learning_rate:0.10, r2: 0.6210
n_estimators:20, loss:huber, learning_rate:0.20, r2: 0.7313
n_estimators:20, loss:huber, learning_rate:0.30, r2: 0.7794
n_estimators:20, loss:huber, learning_rate:0.40, r2: 0.8207
n_estimators:20, loss:huber, learning_rate:0.50, r2: 0.8415
n_estimators:20, loss:huber, learning_rate:0.60, r2: 0.7974
n_estimators:20, loss:huber, learning_rate:0.70, r2: 0.8105
n_estimators:20, loss:huber, learning_rate:0.80, r2: 0.6530
n_estimators:20, loss:huber, learning_rate:0.90, r2: 0.7436
n_estimators:20, loss:huber, learning_rate:1.00, r2: 0.8018
n_estimators:20, loss:quantile, learning_rate:0.10, r2: 0.3999
n_estimators:20, loss:quantile, learning_rate:0.20, r2: 0.6646
n_estimators:20, loss:quantile, learning_rate:0.30, r2: 0.6932
n_estimators:20, loss:quantile, learning_rate:0.40, r2: 0.6944
n_estimators:20, loss:quantile, learning_rate:0.50, r2: 0.6748
n_estimators:20, loss:quantile, learning_rate:0.60, r2: 0.6579
n_estimators:20, loss:quantile, learning_rate:0.70, r2: 0.5760
n_estimators:20, loss:quantile, learning_rate:0.80, r2: 0.6285
n_estimators:20, loss:quantile, learning_rate:0.90, r2: 0.5640
n_estimators:20, loss:quantile, learning_rate:1.00, r2: 0.4109
n_estimators:50, loss:ls, learning_rate:0.10, r2: 0.7640
n_estimators:50, loss:ls, learning_rate:0.20, r2: 0.8337
n_estimators:50, loss:ls, learning_rate:0.30, r2: 0.8331
n_estimators:50, loss:ls, learning_rate:0.40, r2: 0.7908
n_estimators:50, loss:ls, learning_rate:0.50, r2: 0.7318
n_estimators:50, loss:ls, learning_rate:0.60, r2: 0.7638
n_estimators:50, loss:ls, learning_rate:0.70, r2: 0.6778
n_estimators:50, loss:ls, learning_rate:0.80, r2: 0.7654
n_estimators:50, loss:ls, learning_rate:0.90, r2: 0.7321
n_estimators:50, loss:ls, learning_rate:1.00, r2: 0.5572
n_estimators:50, loss:lad, learning_rate:0.10, r2: 0.6928
n_estimators:50, loss:lad, learning_rate:0.20, r2: 0.7723
n_estimators:50, loss:lad, learning_rate:0.30, r2: 0.7726
n_estimators:50, loss:lad, learning_rate:0.40, r2: 0.7695
n_estimators:50, loss:lad, learning_rate:0.50, r2: 0.8229
n_estimators:50, loss:lad, learning_rate:0.60, r2: 0.7921
n_estimators:50, loss:lad, learning_rate:0.70, r2: 0.6918
n_estimators:50, loss:lad, learning_rate:0.80, r2: 0.7345
n_estimators:50, loss:lad, learning_rate:0.90, r2: 0.7104
n_estimators:50, loss:lad, learning_rate:1.00, r2: 0.6775
n_estimators:50, loss:huber, learning_rate:0.10, r2: 0.7619
n_estimators:50, loss:huber, learning_rate:0.20, r2: 0.8040
n_estimators:50, loss:huber, learning_rate:0.30, r2: 0.8124
n_estimators:50, loss:huber, learning_rate:0.40, r2: 0.8486
n_estimators:50, loss:huber, learning_rate:0.50, r2: 0.8557
n_estimators:50, loss:huber, learning_rate:0.60, r2: 0.8052
n_estimators:50, loss:huber, learning_rate:0.70, r2: 0.8166
n_estimators:50, loss:huber, learning_rate:0.80, r2: 0.6511
n_estimators:50, loss:huber, learning_rate:0.90, r2: 0.7323
n_estimators:50, loss:huber, learning_rate:1.00, r2: 0.7960
n_estimators:50, loss:quantile, learning_rate:0.10, r2: 0.7164
n_estimators:50, loss:quantile, learning_rate:0.20, r2: 0.7633
n_estimators:50, loss:quantile, learning_rate:0.30, r2: 0.7323
n_estimators:50, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:50, loss:quantile, learning_rate:0.50, r2: 0.7002
n_estimators:50, loss:quantile, learning_rate:0.60, r2: 0.6667
n_estimators:50, loss:quantile, learning_rate:0.70, r2: 0.5759
n_estimators:50, loss:quantile, learning_rate:0.80, r2: 0.6384
n_estimators:50, loss:quantile, learning_rate:0.90, r2: 0.5985
n_estimators:50, loss:quantile, learning_rate:1.00, r2: 0.3767
n_estimators:100, loss:ls, learning_rate:0.10, r2: 0.7833
n_estimators:100, loss:ls, learning_rate:0.20, r2: 0.8461
n_estimators:100, loss:ls, learning_rate:0.30, r2: 0.8299
n_estimators:100, loss:ls, learning_rate:0.40, r2: 0.8011
n_estimators:100, loss:ls, learning_rate:0.50, r2: 0.7359
n_estimators:100, loss:ls, learning_rate:0.60, r2: 0.7714
n_estimators:100, loss:ls, learning_rate:0.70, r2: 0.6857
n_estimators:100, loss:ls, learning_rate:0.80, r2: 0.7741
n_estimators:100, loss:ls, learning_rate:0.90, r2: 0.7213
n_estimators:100, loss:ls, learning_rate:1.00, r2: 0.5668
n_estimators:100, loss:lad, learning_rate:0.10, r2: 0.7420
n_estimators:100, loss:lad, learning_rate:0.20, r2: 0.8070
n_estimators:100, loss:lad, learning_rate:0.30, r2: 0.7795
n_estimators:100, loss:lad, learning_rate:0.40, r2: 0.7834
n_estimators:100, loss:lad, learning_rate:0.50, r2: 0.8223
n_estimators:100, loss:lad, learning_rate:0.60, r2: 0.7918
n_estimators:100, loss:lad, learning_rate:0.70, r2: 0.7206
n_estimators:100, loss:lad, learning_rate:0.80, r2: 0.7338
n_estimators:100, loss:lad, learning_rate:0.90, r2: 0.6832
n_estimators:100, loss:lad, learning_rate:1.00, r2: 0.6891
n_estimators:100, loss:huber, learning_rate:0.10, r2: 0.8063
n_estimators:100, loss:huber, learning_rate:0.20, r2: 0.8199
n_estimators:100, loss:huber, learning_rate:0.30, r2: 0.8149
n_estimators:100, loss:huber, learning_rate:0.40, r2: 0.8575
n_estimators:100, loss:huber, learning_rate:0.50, r2: 0.8687
n_estimators:100, loss:huber, learning_rate:0.60, r2: 0.7969
n_estimators:100, loss:huber, learning_rate:0.70, r2: 0.8283
n_estimators:100, loss:huber, learning_rate:0.80, r2: 0.6596
n_estimators:100, loss:huber, learning_rate:0.90, r2: 0.7267
n_estimators:100, loss:huber, learning_rate:1.00, r2: 0.7902
n_estimators:100, loss:quantile, learning_rate:0.10, r2: 0.7818
n_estimators:100, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:100, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:100, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:100, loss:quantile, learning_rate:0.50, r2: 0.7022
n_estimators:100, loss:quantile, learning_rate:0.60, r2: 0.6812
n_estimators:100, loss:quantile, learning_rate:0.70, r2: 0.6050
n_estimators:100, loss:quantile, learning_rate:0.80, r2: 0.6532
n_estimators:100, loss:quantile, learning_rate:0.90, r2: 0.5779
n_estimators:100, loss:quantile, learning_rate:1.00, r2: 0.3861
n_estimators:150, loss:ls, learning_rate:0.10, r2: 0.7919
n_estimators:150, loss:ls, learning_rate:0.20, r2: 0.8462
n_estimators:150, loss:ls, learning_rate:0.30, r2: 0.8344
n_estimators:150, loss:ls, learning_rate:0.40, r2: 0.8044
n_estimators:150, loss:ls, learning_rate:0.50, r2: 0.7386
n_estimators:150, loss:ls, learning_rate:0.60, r2: 0.7721
n_estimators:150, loss:ls, learning_rate:0.70, r2: 0.6837
n_estimators:150, loss:ls, learning_rate:0.80, r2: 0.7751
n_estimators:150, loss:ls, learning_rate:0.90, r2: 0.7218
n_estimators:150, loss:ls, learning_rate:1.00, r2: 0.5553
n_estimators:150, loss:lad, learning_rate:0.10, r2: 0.7709
n_estimators:150, loss:lad, learning_rate:0.20, r2: 0.8230
n_estimators:150, loss:lad, learning_rate:0.30, r2: 0.7806
n_estimators:150, loss:lad, learning_rate:0.40, r2: 0.8013
n_estimators:150, loss:lad, learning_rate:0.50, r2: 0.8267
n_estimators:150, loss:lad, learning_rate:0.60, r2: 0.7934
n_estimators:150, loss:lad, learning_rate:0.70, r2: 0.7262
n_estimators:150, loss:lad, learning_rate:0.80, r2: 0.7352
n_estimators:150, loss:lad, learning_rate:0.90, r2: 0.6833
n_estimators:150, loss:lad, learning_rate:1.00, r2: 0.6865
n_estimators:150, loss:huber, learning_rate:0.10, r2: 0.8180
n_estimators:150, loss:huber, learning_rate:0.20, r2: 0.8178
n_estimators:150, loss:huber, learning_rate:0.30, r2: 0.8259
n_estimators:150, loss:huber, learning_rate:0.40, r2: 0.8594
n_estimators:150, loss:huber, learning_rate:0.50, r2: 0.8678
n_estimators:150, loss:huber, learning_rate:0.60, r2: 0.7979
n_estimators:150, loss:huber, learning_rate:0.70, r2: 0.8309
n_estimators:150, loss:huber, learning_rate:0.80, r2: 0.6528
n_estimators:150, loss:huber, learning_rate:0.90, r2: 0.7284
n_estimators:150, loss:huber, learning_rate:1.00, r2: 0.8012
n_estimators:150, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:150, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:150, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:150, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:150, loss:quantile, learning_rate:0.50, r2: 0.7045
n_estimators:150, loss:quantile, learning_rate:0.60, r2: 0.7055
n_estimators:150, loss:quantile, learning_rate:0.70, r2: 0.6085
n_estimators:150, loss:quantile, learning_rate:0.80, r2: 0.6644
n_estimators:150, loss:quantile, learning_rate:0.90, r2: 0.5798
n_estimators:150, loss:quantile, learning_rate:1.00, r2: 0.4027
n_estimators:200, loss:ls, learning_rate:0.10, r2: 0.7938
n_estimators:200, loss:ls, learning_rate:0.20, r2: 0.8455
n_estimators:200, loss:ls, learning_rate:0.30, r2: 0.8353
n_estimators:200, loss:ls, learning_rate:0.40, r2: 0.8032
n_estimators:200, loss:ls, learning_rate:0.50, r2: 0.7374
n_estimators:200, loss:ls, learning_rate:0.60, r2: 0.7706
n_estimators:200, loss:ls, learning_rate:0.70, r2: 0.6836
n_estimators:200, loss:ls, learning_rate:0.80, r2: 0.7729
n_estimators:200, loss:ls, learning_rate:0.90, r2: 0.7227
n_estimators:200, loss:ls, learning_rate:1.00, r2: 0.5530
n_estimators:200, loss:lad, learning_rate:0.10, r2: 0.7814
n_estimators:200, loss:lad, learning_rate:0.20, r2: 0.8274
n_estimators:200, loss:lad, learning_rate:0.30, r2: 0.7815
n_estimators:200, loss:lad, learning_rate:0.40, r2: 0.8074
n_estimators:200, loss:lad, learning_rate:0.50, r2: 0.8266
n_estimators:200, loss:lad, learning_rate:0.60, r2: 0.7935
n_estimators:200, loss:lad, learning_rate:0.70, r2: 0.7272
n_estimators:200, loss:lad, learning_rate:0.80, r2: 0.7375
n_estimators:200, loss:lad, learning_rate:0.90, r2: 0.6825
n_estimators:200, loss:lad, learning_rate:1.00, r2: 0.6801
n_estimators:200, loss:huber, learning_rate:0.10, r2: 0.8238
n_estimators:200, loss:huber, learning_rate:0.20, r2: 0.8288
n_estimators:200, loss:huber, learning_rate:0.30, r2: 0.8264
n_estimators:200, loss:huber, learning_rate:0.40, r2: 0.8600
n_estimators:200, loss:huber, learning_rate:0.50, r2: 0.8675
n_estimators:200, loss:huber, learning_rate:0.60, r2: 0.8028
n_estimators:200, loss:huber, learning_rate:0.70, r2: 0.8301
n_estimators:200, loss:huber, learning_rate:0.80, r2: 0.6503
n_estimators:200, loss:huber, learning_rate:0.90, r2: 0.7346
n_estimators:200, loss:huber, learning_rate:1.00, r2: 0.8009
n_estimators:200, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:200, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:200, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:200, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:200, loss:quantile, learning_rate:0.50, r2: 0.7064
n_estimators:200, loss:quantile, learning_rate:0.60, r2: 0.7125
n_estimators:200, loss:quantile, learning_rate:0.70, r2: 0.6111
n_estimators:200, loss:quantile, learning_rate:0.80, r2: 0.6794
n_estimators:200, loss:quantile, learning_rate:0.90, r2: 0.5594
n_estimators:200, loss:quantile, learning_rate:1.00, r2: 0.3322
n_estimators:500, loss:ls, learning_rate:0.10, r2: 0.7954
n_estimators:500, loss:ls, learning_rate:0.20, r2: 0.8494
n_estimators:500, loss:ls, learning_rate:0.30, r2: 0.8365
n_estimators:500, loss:ls, learning_rate:0.40, r2: 0.8047
n_estimators:500, loss:ls, learning_rate:0.50, r2: 0.7365
n_estimators:500, loss:ls, learning_rate:0.60, r2: 0.7696
n_estimators:500, loss:ls, learning_rate:0.70, r2: 0.6831
n_estimators:500, loss:ls, learning_rate:0.80, r2: 0.7749
n_estimators:500, loss:ls, learning_rate:0.90, r2: 0.7229
n_estimators:500, loss:ls, learning_rate:1.00, r2: 0.5536
n_estimators:500, loss:lad, learning_rate:0.10, r2: 0.8136
n_estimators:500, loss:lad, learning_rate:0.20, r2: 0.8470
n_estimators:500, loss:lad, learning_rate:0.30, r2: 0.7917
n_estimators:500, loss:lad, learning_rate:0.40, r2: 0.8093
n_estimators:500, loss:lad, learning_rate:0.50, r2: 0.8248
n_estimators:500, loss:lad, learning_rate:0.60, r2: 0.7984
n_estimators:500, loss:lad, learning_rate:0.70, r2: 0.7235
n_estimators:500, loss:lad, learning_rate:0.80, r2: 0.7537
n_estimators:500, loss:lad, learning_rate:0.90, r2: 0.6900
n_estimators:500, loss:lad, learning_rate:1.00, r2: 0.6772
n_estimators:500, loss:huber, learning_rate:0.10, r2: 0.8365
n_estimators:500, loss:huber, learning_rate:0.20, r2: 0.8289
n_estimators:500, loss:huber, learning_rate:0.30, r2: 0.8255
n_estimators:500, loss:huber, learning_rate:0.40, r2: 0.8618
n_estimators:500, loss:huber, learning_rate:0.50, r2: 0.8669
n_estimators:500, loss:huber, learning_rate:0.60, r2: 0.8046
n_estimators:500, loss:huber, learning_rate:0.70, r2: 0.8324
n_estimators:500, loss:huber, learning_rate:0.80, r2: 0.6512
n_estimators:500, loss:huber, learning_rate:0.90, r2: 0.7330
n_estimators:500, loss:huber, learning_rate:1.00, r2: 0.8048
n_estimators:500, loss:quantile, learning_rate:0.10, r2: 0.7841
n_estimators:500, loss:quantile, learning_rate:0.20, r2: 0.7708
n_estimators:500, loss:quantile, learning_rate:0.30, r2: 0.7324
n_estimators:500, loss:quantile, learning_rate:0.40, r2: 0.7203
n_estimators:500, loss:quantile, learning_rate:0.50, r2: 0.7166
n_estimators:500, loss:quantile, learning_rate:0.60, r2: 0.7160
n_estimators:500, loss:quantile, learning_rate:0.70, r2: 0.6256
n_estimators:500, loss:quantile, learning_rate:0.80, r2: 0.6652
n_estimators:500, loss:quantile, learning_rate:0.90, r2: 0.5556
n_estimators:500, loss:quantile, learning_rate:1.00, r2: 0.3129

由上述结果可知,GBDT在n_estimators=100, loss=huber, learning_rate=0.50时效果最佳,此时r2=0.8687。

对比各个回归模型的运行时间

从上述四种模型的最优参数模型的回归效果可知效果差别不大,下面从运行时间角度,对四种模型进行对比:

%%time
# bagging回归
bagr = BaggingRegressor(base_estimator=RI, n_estimators=20, bootstrap_features=False, random_state=10)
bagr.fit(X_train,y_train)
print("r2:%0.4f" %(bagr.score(X_test,y_test)))
r2:0.8803
Wall time: 199 ms
%%time
# 随机森林
RF = RandomForestRegressor(n_estimators=20, criterion="mse", max_features="auto",random_state=10)
RF.fit(X_train, y_train)
print("r2: %0.4f" %(RF.score(X_test,y_test)))
r2: 0.8242
Wall time: 380 ms
%%time
# AdaBoost
ABR =  AdaBoostRegressor(base_estimator=DTR, random_state=10,n_estimators = 20, loss = "square", learning_rate =0.3)
ABR.fit(X_train, y_train)
print("r2: %0.4f" %(ABR.score(X_test,y_test)))
r2: 0.8476
Wall time: 582 ms
%%time
# GBDT
GBR = GradientBoostingRegressor(loss="huber", learning_rate=0.5, n_estimators=100,random_state=10)
GBR.fit(X_train, y_train)
print("r2: %0.4f" %(GBR.score(X_test,y_test)))
r2: 0.8687
Wall time: 758 ms

结论

从r2指标和运行时间角度综合考虑,采用以岭回归为基学习器的bagging回归模型,在n_estimators = 20,bootstrap_features=False时,效果最佳。

 类似资料:

相关阅读

相关文章

相关问答