当前位置: 首页 > 知识库问答 >
问题:

使用GridSearchCV时出错,但不使用GridSearchCV时出错-Python 3.6。7.

益英逸
2023-03-14

我遇到了一个奇怪的错误,在使用GridSearchCV时,我的代码失败了,而不是单独运行sklearnmlpregressionr时。

以下代码

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
import pandas as pd
import numpy as np

def str_to_num(arr):
    le = preprocessing.LabelEncoder()
    new_arr = le.fit_transform(arr)
    return new_arr

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

data = pd.read_csv('data2.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL']
# remove the header
label = label[1:]

# create the data, or the data that is to be estimated
data = data.drop('TOTAL', axis=1)
data = data.drop('SERIALNUM', axis=1)
# remove the header
data = data[1:]

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

mlp = MLPRegressor(activation = 'relu', solver = 'lbfgs', verbose=False)
mlp.fit(X_train, y_train)
mlp_predictions = mlp.predict(X_test)
mlp_differences = compare_values(y_test, mlp_predictions)
mlp_Avg = np.average(mlp_differences)
print(mlp_Avg)

打印以下内容:

32.92041129078561(是的,我知道平均误差很差)

但是,当尝试优化参数时,相同的参数设置会产生错误:

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
import pandas as pd
import numpy as np


def str_to_num(arr):
    le = preprocessing.LabelEncoder()
    new_arr = le.fit_transform(arr)
    return new_arr

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

data = pd.read_csv('data2.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# remove the header
label = label[1:]

# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
data = data.drop('SERIALNUM', axis=1)
# remove the header
data = data[1:]

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

param_grid = {
    #'hidden_layer_sizes': [(1,),(2,),(3,),(10,),(15,),(20,),(25,)],
    'activation': ['identity', 'logistic', 'relu'],
    #'activation': ['relu'],
    'solver': ['lbfgs', 'sgd', 'adam'],
    #'solver': ['adam']
    #'alpha': [0.0001, 0.0005, 0.0009],
    #'learning_rate': ['constant', 'invscaling', 'adaptive'],
    #'learning_rate_init': [0.001, 0.01, 0.99],
    #'warm_start': [True, False]
    #'momentum': [0.1, 0.9, 0.99]
    # Did not solver-specifics...yet
}# Create a based model

mlp = MLPRegressor()# Instantiate the grid search model
grid_search = GridSearchCV(estimator = mlp, param_grid = param_grid, 
                          cv = 3, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train)
print()
print(grid_search.best_params_)
print(grid_search.best_score_)
print()
print("Grid scores on development set: ")
print()
answers = grid_search.predict(X_test)
results = compare_values(answers, y_test)
print("Accuracy: ", np.average(results))
print()

结果如下:

为9个候选对象中的每一个匹配3次,总共27次匹配[并行(n_jobs=-1)]:使用后端LokyBackend和8个并发工作人员。[CV]激活=身份,解算器=lbfgs。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。[CV]激活=身份,解算器=lbfgs。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。[CV]激活=身份,解算器=新加坡元。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。C:\Python367-64\lib\site packages\sklearn\neural\u network\u base。py:195:RuntimeWarning:square返回中遇到溢出((y_true-y_pred)**2)。平均值()/2[CV]激活=身份,解算器=亚当。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。[CV]激活=身份,解算器=lbfgs。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。[CV]激活=身份,解算器=新加坡元。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。[CV]激活=身份,解算器=新加坡元。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

!!! 这里是它开始失败的地方[CV]!!!!

.................... 激活=relu,解算器=lbfgs,总计=0.5s

乔布里。外部的。洛基。进程执行器_远程回溯:“”回溯(最近一次调用):文件“C:\Python367-64\lib\site packages\joblib\externals\loky\process\u executor”。py”,第418行,在_进程_工作者r=call_项()文件“C:\Python367-64\lib\site packages\joblib\externals\loky\process\u executor”中。py”,第272行,在调用返回self.fn(*self.args,**self.kwargs)文件“C:\Python367-64\lib\site packages\joblib\u parallel\u backends”中。py”,第567行,在调用返回self.func(*args,**kwargs)文件“C:\Python367-64\lib\site packages\joblib\parallel”中。py“,第225行,在self.items]文件“C:\Python367-64\lib\site packages\joblib\parallel”中调用func、args、kwargs。py“,第225行,在self.items]文件“C:\Python367-64\lib\site packages\sklearn\model\u selection\u validation”中用于func、args和kwargs。py”,第554行,在_fit_和_scoretest_scores=_score(估计器,X_test,y_test,scorer,is_multimetric)文件“C:\Python367-64\lib\site packages\sklearn\model_selection\u validation”。py”,第597行,在多指标评分(估计器、X检验、y检验、记分器)文件“C:\Python367-64\lib\site packages\sklearn\model\u selection\u validation”中。py”,第627行,在_multimetric_score score=scorer(估计器,X_测试,y_测试)文件“C:\Python367-64\lib\site packages\sklearn\metrics\scorer”中。py”,第240行,在_passthrough_scorer return estimator.score(*args,**kwargs)文件“C:\Python367-64\lib\site packages\sklearn\base”中。py“,第410行,在得分y_类型中,u,u=_检查_注册_目标(y,y_pred,无)文件”C:\Python367-64\lib\site packages\sklearn\metrics\regression。py“,第79行,在_check_reg_targets y_pred=check_array(y_pred,sure_2d=False)文件“C:\Python367-64\lib\site packages\sklearn\utils\validation”中。py“,第542行,在check\u array allow\u nan=force\u all\u finite==“allow nan”)文件“C:\Python367-64\lib\site packages\sklearn\utils\validation”中。py“,第56行,在_assert_all_finite raise ValueError(msg_err.format(type_err,X.dtype))ValueError中:输入包含NaN、无穷大或对于dtype('float64')来说太大的值。”""

上述异常是以下异常的直接原因:

回溯(最近一次调用last):文件“mlp\u optimizer.py”,第93行,在网格搜索中。fit(X\u train,y\u train)文件“C:\Python367-64\lib\site packages\sklearn\model\u selection\u search.py”,第687行,在fit self中_运行搜索(评估候选项)文件“C:\Python367-64\lib\site packages\sklearn\model\u selection\u search.py”,第1148行,在评估候选项(ParameterGrid(self.param\u grid))文件“C:\Python367-64\lib\site packages\sklearn\model\u selection\u search.py”第666行中。拆分(X、y、组)))文件“C:\Python367-64\lib\site packages\joblib\parallel.py”,第934行,在call self中。retrieve()文件“C:\Python367-64\lib\site packages\joblib\parallel.py”,第833行,在retrieve self中_输出扩展(job.get(timeout=self.timeout))文件“C:\Python367-64\lib\site packages\joblib\u parallel\u backends.py”,第521行,在wrap\u future\u result return future中。结果(timeout=timeout)文件“C:\Python367-64\lib\concurrent\futures\u base.py”,第432行,在结果返回self中__获取结果()文件“C:\Python367-64\lib\concurrent\futures\u base.py”,第384行,在获取结果中_异常值错误:输入包含NaN、无穷大或对数据类型('float64')太大的值。

为什么不使用GridSearchCV,但使用GridSearchCV会导致它失败时它会工作?

共有1个答案

陶树
2023-03-14

这个问题与这一行有关:

'solver':['lbfgs','sgd','adam',

sgd选项要求每个留档的特定阈值中的某些参数

只需更改'solver':['lbfgs','sgd','adam',

'solver':['lbfgs','adam'],

解决了这个问题

 类似资料:
  • 问题内容: 我的代码- 错误- > 警告:mysql_real_escape_string()[function.mysql-real-escape-string]:在C:\ xampp \ htdocs \ shizin \ admin \中拒绝访问用户’ODBC’@’localhost’(使用密码:NO)第48行的newArticle.php 警告:mysql_real_escape_stri

  • 为什么我在使用: mysqli_report(MYSQLI_REPORT_ALL);?致命错误:未捕获异常“mysqli_sql_exception”,在d:\xampp\htdocs\cms\includes\class\db.class.php:31堆栈跟踪:#0 d:\xampp\htdocs\cms\includes\class\db.class.php(31):mysqli->查询(“S

  • 我刚刚在Imac Os 10.6.8上安装了Enthough Corporation。当我尝试测试sklearn时,我收到一条错误消息(附在下面)。错误是:numpy。果心多重数组导入失败。看起来使用的是错误版本的numpy。我不知道怎么修理。 mu51220:~rscherl$python Enthow Canopy python 2.7.3 | 32位|(默认值,2013年6月14日,18:2

  • 我试图使用fromFile,但我得到错误。该文件已经存在,可以用打开,但是如果我用相同的路径打开同一个文件,会出现错误。下面是我的代码: 以下是错误

  • 当我使用Firebase时,我总是遇到这个错误! > 出了什么问题: 任务执行失败:应用程序:transformResourcesWithMergeJavaResForDebug。com.android.build.api.transform。TransformException:com.android.builder.packaging。DuplicateFileException:在APK M