问题：

如何使用Pandas获得每个组相对于条件行的每n天斜率？

闽涵蓄

2023-03-14

我有以下数据帧（示例）：

import pandas as pd

n = 3

data = [['A', '2022-09-01', False, 2, -3], ['A', '2022-09-02', False, 1, -2], ['A', '2022-09-03', False, 1, -1], ['A', '2022-09-04', True, 3, 0], 
        ['A', '2022-09-05', False, 3, 1], ['A', '2022-09-06', False, 2, 2], ['A', '2022-09-07', False, 1, 3], ['A', '2022-09-07', False, 2, 3], 
        ['A', '2022-09-08', False, 4, 4], ['A', '2022-09-09', False, 2, 5],
        ['B', '2022-09-01', False, 2, -4], ['B', '2022-09-02', False, 2, -3], ['B', '2022-09-03', False, 4, -2], ['B', '2022-09-04', False, 2, -1], 
        ['B', '2022-09-05', True, 2, 0], ['B', '2022-09-06', False, 2, 1], ['B', '2022-09-07', False, 1, 2], ['B', '2022-09-08', False, 3, 3], 
        ['B', '2022-09-09', False, 3, 4], ['B', '2022-09-10', False, 2, 5]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value', 'diff_days'])

   group        date  indicator  value  diff_days
0      A  2022-09-01      False      2         -3
1      A  2022-09-02      False      1         -2
2      A  2022-09-03      False      1         -1
3      A  2022-09-04       True      3          0
4      A  2022-09-05      False      3          1
5      A  2022-09-06      False      2          2
6      A  2022-09-07      False      1          3
7      A  2022-09-07      False      2          3
8      A  2022-09-08      False      4          4
9      A  2022-09-09      False      2          5
10     B  2022-09-01      False      2         -4
11     B  2022-09-02      False      2         -3
12     B  2022-09-03      False      4         -2
13     B  2022-09-04      False      2         -1
14     B  2022-09-05       True      2          0
15     B  2022-09-06      False      2          1
16     B  2022-09-07      False      1          2
17     B  2022-09-08      False      3          3
18     B  2022-09-09      False      3          4
19     B  2022-09-10      False      2          5

我想计算每组n行相对于条件行的斜率（指示符==true）。这意味着它应该返回一个列“斜率”，其中斜率在条件行之前和之后，该行的斜率应该为0。除此之外，我想返回一个名为“id”的列，它实际上是表示条件行之前（负）或之后（正）斜率的值的组id。这是所需的输出：

data = [['A', '2022-09-01', False, 2, -3, -1, -0.5], ['A', '2022-09-02', False, 1, -2, -1, -0.5], ['A', '2022-09-03', False, 1, -1, -1, -0.5], ['A', '2022-09-04', True, 3, 0, 0, 0], 
        ['A', '2022-09-05', False, 3, 1, 1, -1], ['A', '2022-09-06', False, 2, 2, 1, -1], ['A', '2022-09-07', False, 1, 3, 1, -1], ['A', '2022-09-07', False, 2, 3, 2, 0], 
        ['A', '2022-09-08', False, 4, 4, 2, 0], ['A', '2022-09-09', False, 2, 5, 2, 0],
        ['B', '2022-09-01', False, 2, -4, -2], ['B', '2022-09-02', False, 2, -3, -1, 0], ['B', '2022-09-03', False, 4, -2, -1, 0], ['B', '2022-09-04', False, 2, -1, -1, 0], 
        ['B', '2022-09-05', True, 2, 0, 0, 0], ['B', '2022-09-06', False, 2, 1, 1, 0.5], ['B', '2022-09-07', False, 1, 2, 1, 0.5], ['B', '2022-09-08', False, 3, 3, 1, 0.5], 
        ['B', '2022-09-09', False, 3, 4, 2, -1], ['B', '2022-09-10', False, 2, 5, 2, -1]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value', 'diff_days', 'id', 'slope'])

   group        date  indicator  value  diff_days  id  slope
0      A  2022-09-01      False      2         -3  -1   -0.5
1      A  2022-09-02      False      1         -2  -1   -0.5
2      A  2022-09-03      False      1         -1  -1   -0.5
3      A  2022-09-04       True      3          0   0    0.0
4      A  2022-09-05      False      3          1   1   -1.0
5      A  2022-09-06      False      2          2   1   -1.0
6      A  2022-09-07      False      1          3   1   -1.0
7      A  2022-09-07      False      2          3   2    0.0
8      A  2022-09-08      False      4          4   2    0.0
9      A  2022-09-09      False      2          5   2    0.0
10     B  2022-09-01      False      2         -4  -2    NaN
11     B  2022-09-02      False      2         -3  -1    0.0
12     B  2022-09-03      False      4         -2  -1    0.0
13     B  2022-09-04      False      2         -1  -1    0.0
14     B  2022-09-05       True      2          0   0    0.0
15     B  2022-09-06      False      2          1   1    0.5
16     B  2022-09-07      False      1          2   1    0.5
17     B  2022-09-08      False      3          3   1    0.5
18     B  2022-09-09      False      3          4   2   -1.0
19     B  2022-09-10      False      2          5   2   -1.0

以下是A组的一些解释：

第0,1和2行是斜率为（x=[-3，-2，-1]， y=[2,1,1]）=-0.5的条件行（id=-1）之前的第一个值
第4、5和6行是（id=1）条件行（第3行）之后的第一个值，斜率（x=[1,2,3]， y=[3,2,1]）=-1
第7、8和9行是（id=2）条件行（第3行）之后的第二个值，斜率（x=[3,4,5]， y=[2,4,2]）=0

所以我想知道是否有人知道是否可以使用熊猫计算每 n 天相对于条件行的斜率？

共有2个答案

程鸿煊

2023-03-14

主要思想可以是:

为每个组创建单独的索引;
将零与标记（有条件）的行对齐;
将索引替换为其下限除以 n;
将正指数向前移动一步，并将它们递增 1 以将它们与零点区分开来。

在此之后，我们可以使用获得的索引作为额外的石斑鱼来计算斜率：

# create individual indexing for eash group
id = df.groupby('group')['indicator'].cumcount()

# find positions of the condition rows in the group indexes
offset = id.where(df.indicator).groupby(df.group).first()

# shift the groups indexes so that condition rows are indexed by zero
id = id.groupby(df.group).transform(lambda x: x - offset[x.name])

# transform the group indexes to their floor division by n
# shift those which ware positive by one position forward
# and increment their values by 1
n = 3 
id = (id//n).mask(id>0,(id//n).shift().add(1))

# assign obtained id to a new column
df['id'] = id

# calculate slopes for each `group,id` pair:
grouped_slopes =  df.groupby(['group','id']).apply(lambda g: slope(g.diff_days, g.value))

# add slopes to the data
df = df.join(grouped_slopes , on=['group','id'])

至于斜率计算，我们可以使用准备好的公式或自己制作。但无论如何，我们还应该区分组中只有一个项目的情况，并为零点（条件行）返回 0，为单元素尾返回 nan：

from typing import Literal

def slope(x, y, engine: Literal['numpy', 'scipy']='numpy'):
    from numpy import polyfit
    from scipy.stats import linregress

    match engine:
        case 'numpy':
            func = lambda x, y: polyfit(x, y, 1)[0]
        case 'scipy':
            func = lambda x, y: linregress(x, y).slope
        case other:
            raise ValueError(f'Wrong {engine=}')

    if len(x) > 1:
        return func(x, y)
    if len(x) == 1 and x.iloc[0] == 0:
        return 0
    return float('nan')

颛孙飞

2023-03-14

这样做的工作，但我不知道是否有任何更好的熊猫做事的方式。

groups=['A','B']
indexs=[]
for i in groups:
    indexs.append(df.loc[(df['group'] == i )& (df['indicator']== True)].index[0])
id2=[]
id3=[]
for i in groups:
    id2=df.loc[(df['group'] == i )].index[:]-indexs[groups.index(i)]
    for j in id2:
        if j < 0:
         id3.append(math.floor(j/n))
        elif j>=0:
         id3.append(math.ceil(j/n))

df['id']=id3

grady=[]
gradx=[]
SlopeList=[]
for i in groups:
    idum=[]
    for number in df['id'].loc[(df['group']==i)]:
        #unique values in list.
        if number not in idum:
            idum.append(number)
    for k in idum:
        grady=df['value'].loc[( df['group'] == i ) &(df['id'] == k ) ]
        gradx=df['diff_days'].loc[ (df['group'] == i )&(df['id'] == k ) ]
        
        Xm=slope(grady.tolist(),gradx.tolist()) #average slope
        for m in range(0,len(gradx)): #create a suitabily sized list with the average slope value.
            SlopeList.append(Xm)
        
df['slope']=SlopeList

p.s .我没有对这段代码进行过任何单元测试，所以请在使用它之前检查一下。

类似资料：

计算每组每 n 天相对于带条件行的斜率

我有以下数据帧（示例）：我想添加一个名为“slope”的列，该列相对于每组条件为“指标 = True”的行返回 n 天的斜率（本例 n = 3）。以下是所需的输出：让我们解释一下 B 组的计算。斜率（使用 “diff_days” 作为 x 值）相对于指标 == True 的行计算 n = 3，即数据框中的第 15 行： < li >对于第12、13、14行，斜率为:Lin regressive
计算每组每n天的斜率

我有以下数据帧（示例）：我想创建一个名为“斜率”的列，它显示每组每n（n=3）天的斜率。这意味着当第一个日期是“2022-09-01”和3天后用于计算时。斜率可以使用“diff_days”（通过与每组第一个值的差异计算）和“值”列来计算。以下是所需的输出：以下是一些示例计算，可为您提供一个想法： A组前3天：斜率（[0,1,3]，[2,1,3]）=0.43 A组3天后：斜率（[5,6,6]，[
Pandas在每个组中获得最高的n条记录

问题内容：假设我有这样的pandas DataFrame：我想获得一个新的DataFrame，其中每个ID的前2个记录如下：我可以对分组依据中的记录进行编号：但是，有没有更有效/更优雅的方法来做到这一点？还有一种更优雅的方法来对每个组中的数字进行记录（例如SQL窗口函数row_number（））。问题答案：你试过了吗 Ouput生成：（请记住，根据数据，你可能需要先进行订购/排序）
pandas每隔n行

问题内容： Dataframe.resample（）仅适用于时间序列数据。我找不到从非时间序列数据中获取第n行的方法。最好的方法是什么？问题答案：我会使用，它根据整数位置并遵循常规python语法获取行/列切片。如果要每第5行：
如何获得每月的某天？

问题内容：我正在尝试检索当月的哪一天。例如今天是2011年8月29日。我想做的只是获取天数，例如29或30。它是每月的哪一天。我将如何去做？问题答案：您需要获取一个Calendar实例并将其作为月份中的某天您还可以获取DAY_OF_WEEK，DAY_OF_YEAR，DAY_OF_WEEK_IN_MONTH等。
如何对pandas中的每个组进行前填

问题内容：我有一个类似于下面的数据框我想为列做一个空值估算，，在正向充填，但每个组。那就是说，我希望将前向填充应用于每个。我怎样才能做到这一点？问题答案：使用每团体向前填充所有列，但如果每个小组第一值是在那儿没有更换，所以可以使用和最后浇铸成整数：详情：要么：

如何使用Pandas获得每个组相对于条件行的每n天斜率？

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档