问题：

pandas如何根据某列不同值，对其他列用不同的bins和labels进行cut函数切割？

宰父衡

2023-05-27

问题描述

有一个DataFrame，我根据某一列‘type’的不同，对‘value’一列进行cut切割分组，赋予不同的分组值。但如果bins和labels数值或者长度不一致，就无法在同一个df里继续操作。

问题出现的环境背景及自己尝试过哪些方法

我尝试用apply的方式去解决，但并没有用好cut函数。

相关代码

import pandas as pd

# 创建示例数据
df = pd.DataFrame({
    'value': [5, 10, 15, 20, 25, 30],
    'type': [1, 1, 1, 2, 2, 2]
})

# 定义两个bins
bins1 = [0, 6, 20]
labels1=[3,2]
bins2 = [0, 20, 22, 50]
labels2=[3,2,1]

# 对type=1的数据使用bins1进行cut，对type=2的数据使用bins2进行cut
df.loc[df['type'] == 1, 'group'] = pd.cut(df.loc[df['type'] == 1, 'value'], bins=bins1, labels=labels1)
df.loc[df['type'] == 2, 'group'] = pd.cut(df.loc[df['type'] == 2, 'value'], bins=bins2, labels=labels2)

# 输出结果
print(df)

报错信息如下：

TypeError                                 Traceback (most recent call last)
Cell In[111], line 17
     15 # 对type=1的数据使用bins1进行cut，对type=2的数据使用bins2进行cut
     16 df.loc[df['type'] == 1, 'group'] = pd.cut(df.loc[df['type'] == 1, 'value'], bins=bins1, labels=labels1)
---> 17 df.loc[df['type'] == 2, 'group'] = pd.cut(df.loc[df['type'] == 2, 'value'], bins=bins2, labels=labels2)
     19 # 输出结果
     20 print(df)

File F:\Anaconda\lib\site-packages\pandas\core\indexing.py:818, in _LocationIndexer.__setitem__(self, key, value)
    815 self._has_valid_setitem_indexer(key)
    817 iloc = self if self.name == "iloc" else self.obj.iloc
--> 818 iloc._setitem_with_indexer(indexer, value, self.name)

File F:\Anaconda\lib\site-packages\pandas\core\indexing.py:1795, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   1792 # align and set the values
   1793 if take_split_path:
   1794     # We have to operate column-wise
-> 1795     self._setitem_with_indexer_split_path(indexer, value, name)
   1796 else:
   1797     self._setitem_single_block(indexer, value, name)

File F:\Anaconda\lib\site-packages\pandas\core\indexing.py:1838, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name)
   1834     self._setitem_with_indexer_2d_value(indexer, value)
   1836 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi):
   1837     # We are setting multiple rows in a single column.
-> 1838     self._setitem_single_column(ilocs[0], value, pi)
   1840 elif len(ilocs) == 1 and 0 != lplane_indexer != len(value):
   1841     # We are trying to set N values into M entries of a single
   1842     #  column, which is invalid for N != M
   1843     # Exclude zero-len for e.g. boolean masking that is all-false
   1845     if len(value) == 1 and not is_integer(info_axis):
   1846         # This is a case like df.iloc[:3, [1]] = [0]
   1847         #  where we treat as df.iloc[:3, 1] = 0

File F:\Anaconda\lib\site-packages\pandas\core\indexing.py:1992, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer)
   1988         value = value[pi]
   1989 else:
   1990     # set value into the column (first attempting to operate inplace, then
   1991     #  falling back to casting if necessary)
-> 1992     self.obj._mgr.column_setitem(loc, plane_indexer, value)
   1993     self.obj._clear_item_cache()
   1994     return

File F:\Anaconda\lib\site-packages\pandas\core\internals\managers.py:1391, in BlockManager.column_setitem(self, loc, idx, value, inplace)
   1389     col_mgr.setitem_inplace(idx, value)
   1390 else:
-> 1391     new_mgr = col_mgr.setitem((idx,), value)
   1392     self.iset(loc, new_mgr._block.values, inplace=True)

File F:\Anaconda\lib\site-packages\pandas\core\internals\managers.py:393, in BaseBlockManager.setitem(self, indexer, value)
    388 if _using_copy_on_write() and not self._has_no_reference(0):
    389     # if being referenced -> perform Copy-on-Write and clear the reference
    390     # this method is only called if there is a single block -> hardcoded 0
    391     self = self.copy()
--> 393 return self.apply("setitem", indexer=indexer, value=value)

File F:\Anaconda\lib\site-packages\pandas\core\internals\managers.py:352, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
    350         applied = b.apply(f, **kwargs)
    351     else:
--> 352         applied = getattr(b, f)(**kwargs)
    353 except (TypeError, NotImplementedError):
    354     if not ignore_failures:

File F:\Anaconda\lib\site-packages\pandas\core\internals\blocks.py:1417, in EABackedBlock.setitem(self, indexer, value)
   1414 check_setitem_lengths(indexer, value, values)
   1416 try:
-> 1417     values[indexer] = value
   1418 except (ValueError, TypeError) as err:
   1419     _catch_deprecated_value_error(err)

File F:\Anaconda\lib\site-packages\pandas\core\arrays\_mixins.py:266, in NDArrayBackedExtensionArray.__setitem__(self, key, value)
    264 def __setitem__(self, key, value) -> None:
    265     key = check_array_indexer(self, key)
--> 266     value = self._validate_setitem_value(value)
    267     self._ndarray[key] = value

File F:\Anaconda\lib\site-packages\pandas\core\arrays\categorical.py:1558, in Categorical._validate_setitem_value(self, value)
   1555 def _validate_setitem_value(self, value):
   1556     if not is_hashable(value):
   1557         # wrap scalars and hashable-listlikes in list
-> 1558         return self._validate_listlike(value)
   1559     else:
   1560         return self._validate_scalar(value)

File F:\Anaconda\lib\site-packages\pandas\core\arrays\categorical.py:2228, in Categorical._validate_listlike(self, value)
   2226 if isinstance(value, Categorical):
   2227     if not is_dtype_equal(self.dtype, value.dtype):
-> 2228         raise TypeError(
   2229             "Cannot set a Categorical with another, "
   2230             "without identical categories"
   2231         )
   2232     # is_dtype_equal implies categories_match_up_to_permutation
   2233     value = self._encode_with_my_categories(value)

TypeError: Cannot set a Categorical with another, without identical categories

你期待的结果是什么？实际看到的错误信息又是什么？

我期望type=1的数据，能够分两组，[0,6)区间的‘group’字段为3，[6,20）区间的‘group’字段为2.而tpye=2的数据，能够分三组，也满足cut()的结果。

上述代码中，虽然用.loc函数对不同分组进行了筛选。但第一次cut分组成功后（即第16行代码是运行成功的），第二次cut分组会失败（即第17行代码会报错），df的group字段，似乎已经定死了方式，只能用相同类型和相同长度的labels进行分组。字段类型为category，我怀疑这个才是不能对同一字段进行不同分组依据切割的原因。

共有1个答案

曾云

2023-05-27

df['group'] = df['group'].astype('object')
df.loc[df['type'] == 2, 'group'] = pd.cut(df.loc[df['type'] == 2, 'value'], bins=bins2, labels=labels2)

类似资料：

根据其他列的标记/值选择不同的值

我有一个表，它有4列()gender具有与每个客户链接的唯一值:、或。该部门与每种产品（男性或女性）都有独特的价值我做了一个复杂的过程。首先，使用客户信息将male和其他客户分开（创建了两个表CUST_MALEY和cust_other) 如果CUST_MALEY表中有客户，则使用join，返回men division products行（其中division='men')；如果cust_oth
MySQL查询：与其他列不同的总和列值

我试图建立MySQL查询与多个连接，其中加入的值的总和。有3个表：保管人，帐户和存款。账户和存款通过客户的customer_id字段连接到客户。在查询结束时，所有客户都按group_id分组：问题是：连接的行重复，而我必须进行一些分析：汇总所有存款金额-您可以在这里看到我针对存款金额的解决方法。但真正的问题是如何计算“客户的首次存款”。在对结果进行分组之前，我们可能会看到：所以，我需要的是总和
根据不同列中的值重复行

问题内容：我有交易数据框。每行代表两个项目的交易（可想而知，就像两张事件票之类的交易一样）。我想根据售出的数量重复每一行。这是示例代码：这将产生一个看起来像这样的数据框因此，在上述情况下，每一行将转换为两个重复的行。如果“数量”列为3，则该行将转换为三个重复的行。问题答案：首先，我使用整数而不是文本重新创建了您的数据。我还更改了数量，以便可以更轻松地理解问题。我通过使用嵌套列表理解结
如何根据不同列选择整行

我在火花中这样做这张表上没有主键，我想根据每个不同的cityId随机获取一行例如，这是一个正确的答案例如，这也是一个正确的答案想到的一种方法是使用< code>group by。然而，这要求我在另一列上使用聚合函数。(比如min())。然而，我只想拉出一整行(不管是哪一行)。
根据列值连接不同的表

问题内容：我有一张桌子叫：每个都与一个不同的表相关，field的值指定了我要用于该表的表的名称。所有目标表都有几个相似的列：当前，我正在使用此查询来选择通知，通知它们在目标表中存在相关行，并且其字段为：但是由于它是a ，如果它与任何表都不匹配，它将返回通知，我该如何重写它，以便它不返回与目标表中的任何行都不匹配的通知？我也尝试了不成功的声明。问题答案：我不是100％肯定语法正确，并且现
pandas基于其他列的值创建新列/应用多列函数（行）

我想将我的自定义函数（它使用if-else梯形）应用到数据帧每行中的这六列(,,,,,)。我已经尝试了不同的方法从其他问题，但似乎仍然不能找到正确的答案，我的问题。关键的一点是，如果这个人被算作西班牙裔，他们就不能算作其他任何东西。即使他们在另一个种族栏中有一个“1”，他们仍然被算作西班牙裔，而不是两个或两个以上的种族。类似地，如果所有ERI列的总和大于1，则被计为两个或两个以上的种族，不能被计

pandas如何根据某列不同值，对其他列用不同的bins和labels进行cut函数切割？

问题描述

问题出现的环境背景及自己尝试过哪些方法

相关代码

你期待的结果是什么？实际看到的错误信息又是什么？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档