问题：

键错误：[Int64Index…]dtype='int64]中没有一个在[列]中

郑景胜

2023-03-14

我试图在管道上运行k-折叠交叉验证（标准化定标器，决策树分类器）。

首先，我导入数据。

data = pd.read_csv('train_strokes.csv')

然后对数据帧进行预处理

# Preprocessing data 
data.drop('id',axis=1,inplace=True)
data['age'] =data['age'].apply(lambda x : x if round(x) else np.nan) 
data['bmi'] = data['bmi'].apply(lambda bmi : bmi if 12< bmi <45 else np.nan)
data['gender'] = data['gender'].apply(lambda gender : gender if gender =='Female' or gender =='Male' else np.nan)
data.sort_values(['gender', 'age','bmi'], inplace=True) 
data['bmi'].ffill(inplace=True)
data.dropna(axis=0,inplace=True)
data.reset_index(drop=True, inplace=True)

#categorial data to numeric value
enc = LabelEncoder()
data['gender'] = enc.fit_transform(data['gender'])
data['work_type'] = enc.fit_transform(data['work_type'])
data['Residence_type'] = enc.fit_transform(data['Residence_type'])
data['smoking_status'] = enc.fit_transform(data['smoking_status'])
data['ever_married'] = enc.fit_transform(data['ever_married'])

然后对特征和目标进行切片

target = data['stroke']
feat = data.drop('stroke',axis=1)

并使用SMOTE来平衡数据

sm = SMOTE(random_state = 1) 
feat, target = sm.fit_resample(feat, target) 
feat['age'] = feat['age'].apply(lambda x : round(x))
feat['hypertension'] = feat['hypertension'].apply(lambda x : round(x))
feat['heart_disease'] = feat['heart_disease'].apply(lambda x : round(x))
feat['ever_married'] = feat['ever_married'].apply(lambda x : round(x))
#split training and test
X_train, X_test, y_train, y_test = train_test_split(feat, target, test_size=0.3, random_state= 2)

这是问题的一部分。

Kfold =KFold(n_splits=10)
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
n_iter = 0
for train_idx, test_idx in Kfold.split(feat):
    pipeline.fit(X_train[train_idx], y_train[train_idx])
    score = pipeline.score(X_train[test_idx],y_train[test_idx])
    print('Fold #{} accuracy{}'.format(1,score))

错误代码

Traceback (most recent call last):
File "/Users/merb/Documents/Dev/DataScience/TP.py", line 84, in <module>
pipeline.fit(X_train[train_idx], y_train[train_idx])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site- 
packages/pandas/core/frame.py", line 3030, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-  
packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-   
packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([ 5893,  5894,  5895,  5896,  5897,  5898,  5899,  5900,    
5901,\n             5902,\n            ...\n            58912, 58913, 58914, 58915, 
58916, 58917, 58918, 58919, 58920,\n            58921],\n           dtype='int64', 
length=53029)] are in the [columns]"

共有1个答案

陆和泰

2023-03-14

您应该使用df。loc[索引]按索引选择行。如果要按整数位置选择行，应使用df。iloc[索引]。

除此之外，您还可以阅读关于索引和使用pandas选择数据的本页。

类似资料：

键错误：“列中没有[Int64Index…]dtype='int64]”

法典：- 错误我试图在列和它们的前陈列室价格之间画一个箱线图。前展厅价格的值是分类的，因此，我首先将它们转换为整数，然后尝试绘制箱线图，但它会抛出错误，关键错误:“None of [Int64Index...] dtype='int64]在列中。
键错误：[Int64Index…]dtype='int64]中没有一个在列中

我试图使用np.random.shuffle（）方法对索引进行洗牌，但我一直收到一个我不理解的错误。如果有人能帮我解决这个问题，我将不胜感激。非常感谢。当我在开始创建我的raw_csv_数据变量时，我尝试使用分隔符='、'和delim_空格=0，因为我认为这是另一个问题的解决方案，但它不断抛出相同的错误这是我尝试洗牌索引时不断遇到的错误： getitem（self，key）中的~\Anacon
键错误：没有[Int64Index（[…]dtype='int64'）]位于[列]

这是我的数据帧：我试着用它做一个非常简单的情节：但我一直收到一条关键错误消息：我尝试将列[a]转换为日期时间，但仍然收到相同的错误消息。
接收关键错误："[Int64Index（[... dtype='int64'，长度=1323）]中没有[列]"

将测试和列车数据输入ROC曲线图时，我收到以下错误： KeyError:“[Int64Index（[0，1，2，…dtype='int64'，length=1323]）中没有一个在[columns]中” 错误似乎是说它不喜欢我的数据格式，但它在第一次运行时起作用，我无法让它再次运行。我是否错误地拆分数据或将格式错误的数据发送到函数中？阅读几个StackOverflow帖子与相同的KeyErro
Python Mens集成：关键错误："[Int64Index（[... dtype='int64'，长度=105）]中没有[列]"

下面是一个小版本的代码，其中我得到了这个错误： KeyError："[Int64Index（[...]，dtype='int64'）]都不在[列]" '...' 是一系列似乎与我的X和y数据帧的索引匹配的数字。我使用Mlens包在一个非常大的数据集上与SuperLearner一起建模（因此可伸缩性非常重要）。我的目标是使用数据帧结构，而不是Numpy数组。这将解决下游问题。到目前为止，我已经探
KeyError："[Int64Index dtype='int64'，长度=9313）]中没有[列]"

有一个323列和10348行的数据帧。我想用下面的代码用分层k-Fold来划分它但是我得到了以下错误有人告诉我为什么会出现这个错误以及如何修复它吗

键错误：[Int64Index…]dtype='int64]中没有一个在[列]中

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档