问题：

熊猫：条件聚合

邹山

2023-03-14

我试图开发以下过滤器与熊猫数据帧：

我有四列，A，B，A_prime和B_prime

如何将其作为聚合函数编写？

下面是一个编写效率低下的工作示例：

import pandas as pd
import numpy as np

data = {
    "A":list(np.abs(np.random.randn(10))),
    "B":list(np.abs(np.random.randn(10))),
    "A_prime":list(np.abs(np.random.randn(10))),
    "B_prime":list(np.abs(np.random.randn(10)))
    
}

df = pd.DataFrame.from_dict(data)
C = 0.2

print("BEFORE:")
print(df)


for index, row in df.iterrows():
    if(row["A"] < C or row["B"] < C):
        max_idx = np.argmax([row["A"], row["B"]])
        if(max_idx==0):
            row["A_prime"] = row["A_prime"] + row["B_prime"]
            row["B_prime"] = 0
        else:
            row["B_prime"] = row["A_prime"] + row["B_prime"]
            row["A_prime"] = 0
    
print("")
print("AFTER:")
print(df)

输出：

BEFORE:
          A         B   A_prime   B_prime
0  0.182445  0.924890  1.563398  0.562325
1  0.252587  0.273637  0.515395  0.538876
2  1.369412  1.985702  1.813962  1.643794
3  0.834666  0.143880  0.860673  0.372468
4  1.380012  0.715774  0.022681  0.892717
5  0.582497  0.477100  0.956821  1.134613
6  0.083045  0.322060  0.362513  1.386124
7  1.384267  0.251577  0.639843  0.458650
8  0.375456  0.412320  0.661661  0.086588
9  0.079226  0.385621  0.601451  0.837827

AFTER:
          A         B   A_prime   B_prime
0  0.182445  0.924890  0.000000  2.125723
1  0.252587  0.273637  0.515395  0.538876
2  1.369412  1.985702  1.813962  1.643794
3  0.834666  0.143880  1.233141  0.000000
4  1.380012  0.715774  0.022681  0.892717
5  0.582497  0.477100  0.956821  1.134613
6  0.083045  0.322060  0.000000  1.748638
7  1.384267  0.251577  0.639843  0.458650
8  0.375456  0.412320  0.661661  0.086588
9  0.079226  0.385621  0.000000  1.439278

共有3个答案

丌官积厚

2023-03-14

您可以基于条件映射重写值，并应用。

# Columns to check the min/max on:
check_max_cols = ["A_prime", "B_prime"]


def allocate_sum(row):
    # Identify max and min values
    max_col = row[check_max_cols].idxmax(axis=1)
    min_col = check_max_cols[1] if max_col == check_max_cols[0] else check_max_cols[0]

    row[max_col] = row[["A_prime", "B_prime"]].sum()
    row[min_col] = 0
    return row


below_threshold = (df[["A", "B"]] < C).any(axis=1)

df.loc[below_threshold, :] = df.loc[below_threshold, :].apply(allocate_sum, axis=1)

祁奇略

2023-03-14

最复杂的方法是使用应用（），如下例所示：

import pandas as pd
import numpy as np

data = {
    "A":list(np.abs(np.random.randn(10))),
    "B":list(np.abs(np.random.randn(10))),
    "A_prime":list(np.abs(np.random.randn(10))),
    "B_prime":list(np.abs(np.random.randn(10)))
    
}

df = pd.DataFrame.from_dict(data)
C = 0.2

def A_B_prime(row):
    A_prime_val = row["A_prime"]
    B_prime_val = row["B_prime"]
    if(row["A"] < C or row["B"] < C):
        max_idx = np.argmax([row["A"], row["B"]])
        if(max_idx==0):
            A_prime_val = row["A_prime"] + row["B_prime"]
            B_prime_val = 0
        else:
            B_prime_val = row["A_prime"] + row["B_prime"]
            A_prime_val = 0
    return A_prime_val, B_prime_val

df['A_prime'], df['B_prime'] = zip(*df.apply(A_B_prime, axis=1))

关于如何在此线程上从单个apply（）返回多个列，您可以找到一些很好的见解。

章高爽

2023-03-14

这里有一个方法：

prime_cols = ["A_prime", "B_prime"]

# get the candidate sums
prime_sums = df[prime_cols].sum(axis=1)

# check which rows satisfy the `C` threshold
threshold_satisfied = df.A.lt(C) | df.B.lt(C)

# set the satisfying rows' values to sums for both columns
df.loc[threshold_satisfied, prime_cols] = prime_sums

# generate a 1-0 mask that will multiply the greater value by 1 and
# smaller value by 0 to "select" one of them and kill other
mask_A_side = df.A.gt(df.B)
the_mask = pd.concat([mask_A_side, ~mask_A_side], axis=1).set_axis(prime_cols, axis=1)

# multiply with the mask
df.loc[threshold_satisfied, prime_cols] *= the_mask

它首先将素数列的总和放入满足阈值条件的两列，然后用1-0掩码乘法杀死其中一列。

为了得到

>>> df

          A         B   A_prime   B_prime
0  0.182445  0.924890  0.000000  2.125723
1  0.252587  0.273637  0.515395  0.538876
2  1.369412  1.985702  1.813962  1.643794
3  0.834666  0.143880  1.233141  0.000000
4  1.380012  0.715774  0.022681  0.892717
5  0.582497  0.477100  0.956821  1.134613
6  0.083045  0.322060  0.000000  1.748637
7  1.384267  0.251577  0.639843  0.458650
8  0.375456  0.412320  0.661661  0.086588
9  0.079226  0.385621  0.000000  1.439278

类似资料：

熊猫的聚集

如何与熊猫进行聚合聚合后没有数据帧！发生了什么事如何聚合字符串列（到s、s、）我如何计算总计数如何创建由聚合值填充的新列我看到了这些反复出现的问题，它们询问了功能的各个方面。如今，关于聚合及其各种用例的大部分信息都分散在几十篇措辞糟糕、无法检索的文章中。这里的目的是为后代整理一些更重要的观点。这个Q 如何透视数据帧，熊猫壳如何对每列都有一个序列的数据帧进行操作熊猫合并101 请注
熊猫：过滤多个条件

问题内容：我正在尝试使用Pandas在几个条件下进行布尔索引。我原来的DataFrame称为。如果执行以下操作，将得到预期的结果：但是，如果我这样做（我认为应该是等效的），则不会返回任何行：知道导致差异的原因是什么？问题答案：使用是因为运算符优先级：或者，在单独的行上创建条件：样品：
熊猫DataFrame的条件逻辑

问题内容：如何将条件逻辑应用于Pandas DataFrame。请参见下面显示的DataFrame，我的原始数据显示在“数据”列中，并且期望的输出显示在其旁边。如果“数据”中的数字小于2.5，则所需的输出为False。我可以应用循环并重新构建DataFrame …但是那是“非Python的” 问题答案：只需将列与该值进行比较：
熊猫按条件顺序合并数据帧

假设我有2个数据帧： DF1： Col1 | Col2 | Col3 XCN000370/17-18C|XCN0003711718C|0003971718 DF2 Col1 | Col2 | Col3 XCN0003711718C|XCN0003711718C|0003971718 我希望它们像这样合并：首次匹配Col1（DF1）和Col1（DF2）在保持不匹配的情况下，将Col1（DF1）与
根据熊猫中的条件删除行

问题内容：我有以下数据框基本上我可以如下过滤行我可以如下所示删除/删除一行但是我想根据条件删除一定数量的行，我该怎么做？问题答案：最好的是但需要反转条件-使所有值相等且更高，如下所示：与功能相同：另一种可能的解决方案是通过以下方法反转掩码：
熊猫：根据阈值条件删除列

我必须解决这个问题：目标：删除大多数行缺少输入的列：1。数据帧df：数据帧2。阈值：确定将删除哪些列。如果阈值为.9，则缺少90%值的列将被丢弃：1。带删除列的数据帧df（如果未删除任何列，则返回相同的数据帧） Excel文档截图我编码了这个：我必须有“自我、博士和阈值”，不能添加更多。代码必须通过下面的测试用例：当我运行VT.drop_nan_col（df，0.9）. head（）时，我不

熊猫：条件聚合

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档