将整个pandas数据帧转换为pandas中的整数（0.17.0）

薛弘壮

2023-03-14

问题内容：

我的问题与此非常相似，但是我需要转换整个数据框，而不仅仅是转换一系列数据框。该to_numeric功能一次只能在一个系列上使用，不能很好地替代不推荐使用的convert_objects命令。有没有办法获得与convert_objects(convert_numeric=True)新熊猫版本中的命令相似的结果？

谢谢MikeMüller的示例。df.apply(pd.to_numeric)如果所有值都可以转换为整数，则效果很好。如果在我的数据帧中我有无法转换为整数的字符串怎么办？例：

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
df.dtypes
Out[59]: 
Words    object
ints     object
dtype: object

然后，我可以运行不赞成使用的函数并获取：

df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[60]: 
Words    object
ints      int64
dtype: object

运行apply命令会给我错误，即使尝试并处理也是如此。

问题答案：

所有列可转换

您可以将该函数应用于所有列：

df.apply(pd.to_numeric)

例：

>>> df = pd.DataFrame({'a': ['1', '2'], 
                       'b': ['45.8', '73.9'],
                       'c': [10.5, 3.7]})

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null object
b    2 non-null object
c    2 non-null float64
dtypes: float64(1), object(2)
memory usage: 64.0+ bytes

>>> df.apply(pd.to_numeric).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null int64
b    2 non-null float64
c    2 non-null float64
dtypes: float64(2), int64(1)
memory usage: 64.0 bytes

并非所有列都可转换

pd.to_numeric具有关键字参数errors：

  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input

ignore如果无法将其转换为数字类型，则将其设置为会返回不变的列。

正如Anton Protopopov所指出的，最优雅的方法是将ignore关键字参数提供给apply()：

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
>>> df.apply(pd.to_numeric, errors='ignore').info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

我以前建议的方法，使用来自模块的partialfunctools更冗长：

>>> from functools import partial
>>> df = pd.DataFrame({'ints': ['3', '5'], 
                       'Words': ['Kobe', 'Bryant']})
>>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

将整个pandas数据帧转换为pandas中的整数（0.17.0）

所有列可转换

并非所有列都可转换

相关阅读

相关文章

相关问答

相关工具

相关文档