当前位置: 首页 > 知识库问答 >
问题:

查找大于/小于当前值的 x 量的值的第一个发生率(行)(遍历数据框中的每一行)

闻人嘉木
2023-03-14

我一直在尽力,但还没有达到目的。我试图迭代向量(df$sample)中的值,并找到比当前值小20%的值的第一个连续发生率。我试图为每一行(示例)找到这个值,并将找到的值的日期打印到新的列中。

这是我的 df:

    date       sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...

我尝试过使用Position()或which()。我想也许我可以用for循环来包装它们中的任何一个,但我的尝试并不完全正确。

for(i in length(df){

df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}

for(i in length(df){

df$conc20 <- min(which(df$sample < df$sample[i]*0.8)

}

我甚至找到了一个dply示例,它接近我所寻找的。

理想情况下:

    date       sample   conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...

我很乐意提供任何澄清。我真的很感谢你的帮助!

共有3个答案

冀望
2023-03-14

如果我理解正确,这可以通过使用两个助手列的非等自连接来解决:

library(data.table)
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
  , conc20 := df[df, on = .(rn > rn, sample < threshold), mult = "first", x.date]][
    , c("rn", "threshold") := NULL][]
          date   sample     conc20
 1: 2020-02-14 0.008470 2020-02-20
 2: 2020-02-15 0.008460 2020-02-20
 3: 2020-02-16 0.007681 2020-02-27
 4: 2020-02-17 0.007144 2020-02-27
 5: 2020-02-18 0.007262 2020-02-27
 6: 2020-02-19 0.007300 2020-02-27
 7: 2020-02-20 0.006604       <NA>
 8: 2020-02-21 0.006843 2020-02-27
 9: 2020-02-22 0.006687 2020-02-27
10: 2020-02-23 0.006991 2020-02-27
11: 2020-02-24 0.007333 2020-02-27
12: 2020-02-25 0.006738 2020-02-27
13: 2020-02-26 0.006279       <NA>
14: 2020-02-27 0.005300       <NA>

on=子句中的第一个条件确保只考虑后续行,第二个条件查找示例

结果通过引用附加为附加列conc20,即不复制整个数据集。最后,通过引用删除两个助手列。

请注意,使用了data.table链接。

为了进行演示,可以显示包括所有帮助器列的非等价自联接的结果:

setDT(df)[, rn := .I][, threshold := 0.8 * sample][
  df, on = .(rn > rn, sample < threshold), mult = "first"]
          date    sample rn threshold     i.date i.sample
 1: 2020-02-20 0.0067760  1 0.0052832 2020-02-14 0.008470
 2: 2020-02-20 0.0067680  2 0.0052832 2020-02-15 0.008460
 3: 2020-02-27 0.0061448  3 0.0042400 2020-02-16 0.007681
 4: 2020-02-27 0.0057152  4 0.0042400 2020-02-17 0.007144
 5: 2020-02-27 0.0058096  5 0.0042400 2020-02-18 0.007262
 6: 2020-02-27 0.0058400  6 0.0042400 2020-02-19 0.007300
 7:       <NA> 0.0052832  7        NA 2020-02-20 0.006604
 8: 2020-02-27 0.0054744  8 0.0042400 2020-02-21 0.006843
 9: 2020-02-27 0.0053496  9 0.0042400 2020-02-22 0.006687
10: 2020-02-27 0.0055928 10 0.0042400 2020-02-23 0.006991
11: 2020-02-27 0.0058664 11 0.0042400 2020-02-24 0.007333
12: 2020-02-27 0.0053904 12 0.0042400 2020-02-25 0.006738
13:       <NA> 0.0050232 13        NA 2020-02-26 0.006279
14:       <NA> 0.0042400 14        NA 2020-02-27 0.005300
library(data.table)
df <- fread("
i   date       sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
580 2020-02-27 0.005300
", drop = 1L)

史淇
2023-03-14

相当混乱,但这应该能解决问题

library(dplyr)
df<- read.csv( sep = " ",  text=
                 "row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279", 
               
)

x <- 1.05

df <- df %>%
  mutate(id =  1:n()) %>% 
  rowwise %>% 
  mutate(greater_row = 
           first(which(sample*x <
                         df$sample[id:nrow(df)]) + 
                   id-1))
df$greater_row <- df$date[df$greater_row]

这应该允许您将x设置为您想要的任何因素

邵飞鸿
2023-03-14

编辑答案

df<- read.csv( sep = " ",  text=
                 "row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",                    
)
df$date=as.Date(as.character(df$date))
df   

#there is no row 20% below, so I am just using 2% below 
# and multiplying 0.98 instead of 0.8

# Finding cross-over before current row    
f_crossover_before<- function(  i  ){
  cutoff= 0.98* df$sample[i]
  res<- max(which( df$sample[1:i]<= cutoff), -1)
  ifelse ( (res>0) , res , NA )  # sapply cannot return dates !
}

# Finding cross-over after  current row   
f_crossover_after<- function(  i  ){
  cutoff<- 0.98* df$sample[i]
  res<- min( i+which( df$sample[(i+1):nrow(df)]<= cutoff), 
        .Machine$integer.max )
  ifelse ( (res<.Machine$integer.max) , res , NA )
}



# A column for  comparison. Only for visual inspection 
df$cutoff<- df$sample*0.98 


df$crossover_before<- sapply( seq_along(df$sample) ,  FUN = f_crossover_before )
df$crossover_before<- df$date[df$crossover_before]

df$crossover_after<- sapply( seq_along(df$sample) ,  FUN = f_crossover_after)
df$crossover_after<- df$date[df$crossover_after]




#View(df)

输出:

#   row       date   sample     cutoff crossover_before crossover_after
#1  591 2020-02-14 0.008470 0.00830060             <NA>      2020-02-16
#2  590 2020-02-15 0.008460 0.00829080             <NA>      2020-02-16
#3  589 2020-02-16 0.007681 0.00752738             <NA>      2020-02-17
#4  588 2020-02-17 0.007144 0.00700112             <NA>      2020-02-20
#5  587 2020-02-18 0.007262 0.00711676             <NA>      2020-02-20
#6  586 2020-02-19 0.007300 0.00715400       2020-02-17      2020-02-20
#7  585 2020-02-20 0.006604 0.00647192             <NA>      2020-02-26
#8  584 2020-02-21 0.006843 0.00670614       2020-02-20      2020-02-22
#9  583 2020-02-22 0.006687 0.00655326             <NA>      2020-02-26
#10 582 2020-02-23 0.006991 0.00685118       2020-02-22      2020-02-25
#11 581 2020-02-24 0.007333 0.00718634       2020-02-23      2020-02-25
#12 580 2020-02-25 0.006738 0.00660324             <NA>      2020-02-26
#13 579 2020-02-26 0.006279 0.00615342             <NA>            <NA>
 类似资料: