当前位置: 首页 > 知识库问答 >
问题:

KeyError:"[Int64Index dtype='int64',长度=9313)]中没有[列]"

魏雅惠
2023-03-14

有一个323列和10348行的数据帧。我想用下面的代码用分层k-Fold来划分它

df= pd.read_csv("path")
 x=df.loc[:, ~df.columns.isin(['flag'])]
 y= df['flag']
StratifiedKFold(n_splits=5, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
       print("TRAIN:", train_index, "TEST:", test_index)
       x_train, x_test = x[train_index], x[test_index]
       y_train, y_test = y[train_index], y[test_index]

但是我得到了以下错误

KeyError: "None of [Int64Index([    0,     1,     2,     3,     4,     5,     6,     7,     8,\n               10,\n            ...\n            10338, 10339, 10340, 10341, 10342, 10343, 10344, 10345, 10346,\n            10347],\n           dtype='int64', length=9313)] are in the [columns]"

有人告诉我为什么会出现这个错误以及如何修复它吗

共有3个答案

李和昶
2023-03-14

尝试按如下方式将pandas dataframe更改为numpy阵列:

pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()

array([[1, 3],
       [2, 4]])
曾泳
2023-03-14

您也可以使用df.take(indices_list,轴=0)

x\u列车,x\u测试=x.take(列表(列车索引),轴=0),x.take(列表(测试索引),轴=0)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html

韩烈
2023-03-14

似乎您有一个数据帧切片问题,而不是StratifiedKFold本身有问题。我为此制作了一个df,并使用iloc在这里切片索引数组来解决这个问题:

from sklearn import model_selection

# The list of some column names in flag
flag = ["raw_sentence", "score"]
x=df.loc[:, ~df.columns.isin(flag)].copy()
y= df[flag].copy()
skf =model_selection.StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    x_train, x_test = x.iloc[list(train_index)], x.iloc[list(test_index)]

训练索引和测试索引是nd数组有点搞砸了这里的工作,我把它们转换成列表。

你可以参考:https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

 类似资料: