问题：

如何在pandas中使用sklearn fit_transform并返回dataframe而不是numpy数组？

喻增

2023-03-14

我想将缩放（使用来自sklearn.preprocessing的StandardScaler（））应用到一个pandas Dataframe。下面的代码返回一个numpy数组，因此我丢失了所有列名和indeces。这不是我想要的。

features = df[["col1", "col2", "col3", "col4"]]
autoscaler = StandardScaler()
features = autoscaler.fit_transform(features)

features = features.apply(lambda x: autoscaler.fit_transform(x))

features = features.apply(lambda x: autoscaler.fit_transform(x.reshape(-1, 1)))

但这给出了：

Traceback（最近的调用为last）：文件“./analyse.py”，第91行，在features=features.apply(lambda x：autoscaler.fit_transform(-1,1））中文件“/usr/lib/python3.5/site-packages/pandas/core/frame.py”，第3972行，在apply return self._apply_standard（f,axis,reduce=reduce）文件“”，第226行，在init mgr=self._init_dict（数据，索引，列，dtype=dtype)文件“/usr/lib/python3.5/site-packages/pandas/core/frame.py”中，第363行，在_init_dict dtype=dtype）文件“/usr/lib/python3.5/site-packages/pandas/core/frame.py”中，第5163行，在_arrays_to_mgr arrays=_homogenize（数组，索引，dtype）文件“r/lib/python3.5/site-packages/pandas/core/series.py”，第2885行，在_sanitize_a中rray引发异常（“数据必须是1维的”）异常：数据必须是1维的

我如何将缩放应用到熊猫数据表中，使数据表完好无损？如果可能，不复制数据。

共有1个答案

车子平

2023-03-14

您可以使用as_matrix()将DataFrame转换为numpy数组。关于随机数据集的示例：

编辑：将as_matrix()更改为values，这不会改变以上as_matrix()文档的最后一句话的结果：

通常，建议使用'.values'。

import pandas as pd
import numpy as np #for the random integer example
df = pd.DataFrame(np.random.randint(0.0,100.0,size=(10,4)),
              index=range(10,20),
              columns=['col1','col2','col3','col4'],
              dtype='float64')

In [14]: df.head(3)
Out[14]:
    col1    col2    col3    col4
    10  3   38  86  65
    11  98  3   66  68
    12  88  46  35  68

from sklearn.preprocessing import StandardScaler
scaled_features = StandardScaler().fit_transform(df.values)

In [15]: scaled_features[:3,:] #lost the indices
Out[15]:
array([[-1.89007341,  0.05636005,  1.74514417,  0.46669562],
       [ 1.26558518, -1.35264122,  0.82178747,  0.59282958],
       [ 0.93341059,  0.37841748, -0.60941542,  0.59282958]])

scaled_features_df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)

In [17]:  scaled_features_df.head(3)
Out[17]:
    col1    col2    col3    col4
10  -1.890073   0.056360    1.745144    0.466696
11  1.265585    -1.352641   0.821787    0.592830
12  0.933411    0.378417    -0.609415   0.592830

编辑2：

偶然发现了sklearn-pandas包。它的重点是使SCIKIT-Learning更容易与熊猫使用。sklearn-pandas在需要对dataframe的列子集应用多种类型的转换时特别有用，这是一种更常见的情况。它已经记录在案，但这就是我们刚才执行的转换的实现方式。

from sklearn_pandas import DataFrameMapper

mapper = DataFrameMapper([(df.columns, StandardScaler())])
scaled_features = mapper.fit_transform(df.copy(), 4)
scaled_features_df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)

类似资料：

如何使用“ in”运算符返回0而不是null

问题内容：我有三张桌子：文字：行中的文字 trigram：所有文字行的trigram text_trigram：文本行包含的三字母组，中间表当我执行此命令时：结果出来了，却没有我想要的结果：这是我除了拥有的东西：此外，我想执行以下操作：有可能的？还是使用运算符是错误的？问题答案：我认为这里的困惑是您假设的值为null ，但实际上没有匹配的行。考虑以下简化版本：如果没有带有的
如何在CrudRepository上使用findAll（）返回列表而不是Iterable[重复]

我想写一个FindAll（）方法，它返回所有学生对象的列表。但是CRUDRepository只有Iterable 目标是将所有学生放入一个列表中，并将其传递给API控制器，以便我可以使用http get获取所有学生。将此方法转换为List的最佳方法是什么在我当前的代码中，学生服务中的findAll方法给我找到的不兼容类型：Iterable。必需：列表错误。服务 API控制器研究报告
jOOQ-返回（）而不返回（），其中（）不可用

我注意到，如果没有where（），returning（）操作不可用。这是故意的吗？这项工作：这不起作用：我应该考虑这个“黑客”吗？
将函数应用于可返回多行的pandas DataFrame

问题内容：我正在尝试转换DataFrame，以便将某些行复制给定的次数。例如：应该转换为：这与使用count函数进行聚合相反。有没有一种简单的方法可以在熊猫中实现（不使用for循环或列表推导）？一种可能是允许函数返回多行（的类似方法）。但是，我认为现在在大熊猫中是不可能的。问题答案：您可以使用groupby：所以你得到您可以根据需要固定结果的索引
而不是返回类名，而是返回内存地址

我试图让2支球队互相比赛。当我说团队1.玩（团队2）时，我称之为;当 i 生成的数字小于 0.5 时，team2 应获胜，如果大于 0.5，则团队 1 应获胜。当团队 1 获胜时，它会正确显示为尼克斯，但当团队 2 获胜时，它会显示内存地址。我怎么能让它正确地说网是赢的，而不是team@78987neu73
将Pandas Dataframe转换为Numpy数组[重复]

以下是一个数据帧它需要转换成一个Numpy数组，其中成为Numpy数组的索引，成为相应的值。即，，等等。如何做到这一点？

如何在pandas中使用sklearn fit_transform并返回dataframe而不是numpy数组？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档