旋转包含字符串的Pandas数据框-'没有要聚合的数字类型'错误

邵阳

2023-03-14

问题内容：

关于此错误，有很多问题，但是环顾四周之后，我仍然无法找到解决方案的想法。我正在尝试使用字符串旋转数据框架，以使一些行数据变为列，但到目前为止还没有解决。

我的df的形状

<class 'pandas.core.frame.DataFrame'>
Int64Index: 515932 entries, 0 to 515931
Data columns (total 5 columns):
id                 515932 non-null object
cc_contact_id      515932 non-null object
Network_Name       515932 non-null object
question           515932 non-null object
response_answer    515932 non-null object
dtypes: object(5)
memory usage: 23.6+ MB

样本格式

id  contact_id  question    response_answer
16  137519  2206    State   Ca
17  137520  2206    State   Ca
18  137521  2206    State   Ca
19  137522  2206    State   Ca
20  137523  2208    City    Lancaster
21  137524  2208    City    Lancaster
22  137525  2208    City    Lancaster
23  137526  2208    City    Lancaster
24  137527  2208    Trip_End Location   Home
25  137528  2208    Trip_End Location   Home
26  137529  2208    Trip_End Location   Home
27  137530  2208    Trip_End Location   Home

我想转向的是

id  contact_id      State   City       Trip_End Location
16  137519  2206    Ca      None       None None
20  137523  2208    None    Lancaster  None None
24  137527  2208    None    None       None Home
etc. etc.

当问题值将成为列，与 response_answer 它是的相应数列，并保留IDS

我尝试过的

unified_df = pd.DataFrame(unified_data, columns=target_table_headers, dtype=object)

pivot_table = unified_df.pivot_table('response_answer',['id','cc_contact_id'],'question')
# OR
pivot_table = unified_df.pivot_table('response_answer','question')

DataError：没有要聚合的数字类型

用字符串值旋转数据框的方法是什么？

问题答案：

aggfuncin的默认pivot_table值为np.sum，它不知道如何处理字符串，并且您还没有指出应该正确地使用什么索引。尝试类似的东西：

pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
                                     columns='question', 
                                     values='response_answer',
                                     aggfunc=lambda x: ' '.join(x))

显式地每id, contact_id对设置一行，并在上旋转一组response_answer值question。该aggfunc只是保证，如果你有多个答案中的原始数据，我们只是将它们连接起来用空格一起同样的问题。的语法pivot_table可能因您的熊猫版本而异。

这是一个简单的例子：

In [24]: import pandas as pd

In [25]: import random

In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})

In [27]: df.head()
Out[27]:
     id question response
0  3100        1      116
1  4500        2      113
2  5000        1      120
3  3900        2      103
4  4300        0      117

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id          100 non-null int64
question    100 non-null object
response    100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB

In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question        0        1    2        3
id
1000      110 120      NaN  100      NaN
1100          NaN  106 108  104      NaN
1200      104 113      119  NaN      101
1300          102      NaN  116  108 120
1400          NaN      NaN  116      NaN

旋转包含字符串的Pandas数据框-'没有要聚合的数字类型'错误

相关阅读

相关文章

相关问答

相关工具

相关文档