当前位置: 首页 > 知识库问答 >
问题:

dplyr mutuate仅添加一次不同列==特定值的新列值

梁丘钊
2023-03-14

我有一个结构如下的dataframe(摘要示例,不是实际的)

dput(df1)
structure(list(MedID = c(111, 111, 111, 111, 111, 111, 222, 222, 
222, 222, 222), Service = structure(c(1L, 1L, 2L, 1L, 1L, 3L, 
3L, 2L, 1L, 1L, 3L), .Label = c("Acute care", "Ext care", "Outpt 
care"), class = "factor"), AdmitDate = structure(c(16832, 16861, 
16892, 16922, 16953, 16983, 17181, 17212, 17240, 17271, 17301), class 
= "Date"), Flag = c(0, 0, 99, 0, 0, 0, 0, 99, 0, 0, 0)), .Names = 
c("MedID", "Service", "AdmitDate", "Flag"), row.names = c(NA, -11L), 
class = "data.frame")
> df1
    MedID  Service   AdmitDate  Flag
1    111 Acute care 2016-02-01    0
2    111 Acute care 2016-03-01    0
3    111   Ext care 2016-04-01   99
4    111 Acute care 2016-05-01    0
5    111 Acute care 2016-06-01    0
6    111 Outpt care 2016-07-01    0
7    222 Outpt care 2017-01-15    0
8    222   Ext care 2017-02-15   99
9    222 Acute care 2017-03-15    0
10   222 Acute care 2017-04-15    0
11   222 Outpt care 2017-05-15    0

我希望使用dplyr、group_by(MedID)和mutate在新的数据帧中添加一列(我们在df2中将其称为Flag2),这样在每个病人(MedID)中,对于唯一MedID中的每个后续行,df2$Flag2列==1,但仅在df1$Flag2列==99之后,否则df2$Flag2列得到0。如果MedID的第一行中df1$flag==99,我可以根据需要对其进行编码,但除此之外,我的代码要么在df2$flag2中只在df1$flag==99的行中生成1,要么在df1$flag==99的给定MedID中的所有行中生成1。所需输出为:

dput(df2)
structure(list(MedID = c(111, 111, 111, 111, 111, 111, 222, 222, 
222, 222, 222), Service = structure(c(1L, 1L, 2L, 1L, 1L, 3L, 
3L, 2L, 1L, 1L, 3L), .Label = c("Acute care", "Ext care", "Outpt 
care"), class = "factor"), AdmitDate = structure(c(16832, 16861, 
16892,16922, 16953, 16983, 17181, 17212, 17240, 17271, 17301), class 
= "Date"),Flag = c(0, 0, 99, 0, 0, 0, 0, 99, 0, 0, 0), Flag2 = c(0, 
0, 1, 1, 1, 1, 0, 1, 1, 1, 1)), .Names = c("MedID", "Service", 
"AdmitDate", "Flag", "Flag2"), row.names = c(NA, -11L), class = 
"data.frame")
> df2
    MedID    Service  AdmitDate Flag Flag2
1    111 Acute care 2016-02-01    0     0
2    111 Acute care 2016-03-01    0     0
3    111   Ext care 2016-04-01   99     1
4    111 Acute care 2016-05-01    0     1
5    111 Acute care 2016-06-01    0     1
6    111 Outpt care 2016-07-01    0     1
7    222 Outpt care 2017-01-15    0     0
8    222   Ext care 2017-02-15   99     1
9    222 Acute care 2017-03-15    0     1
10   222 Acute care 2017-04-15    0     1
11   222 Outpt care 2017-05-15    0     1

这里是代码的一个片段示例,但由于它不能正确执行,所以并不完整...我是否需要在For循环中嵌套突变,这看起来像是混合的R编码?:(注意:DF1$flag只能==99一次每MedID,我认为这会使它更容易。

`df2 <- df1 %>% `
    `group_by(MedID) %>%`
    `mutate(Flag2 = ifelse(df1$Flag == 99, 1, 0))`  

共有1个答案

华佐
2023-03-14

一种解决方案是使用tidyr中的fill。方法是首先添加flag2并将具有flag==99否则具有na的行分配为1

现在在flag2列中向下填充行。最后,将所有NA替换为0。

  library(tidyverse)
  df1 %>%
  group_by(MedID) %>%
  mutate(Flag2 = ifelse(Flag == 99, 1L, NA)) %>%
    fill(Flag2) %>%
    mutate(Flag2 = ifelse(is.na(Flag2), 0L, Flag2))

虽然op没有提到它,但是如果admitdate要决定匹配flag==99之后的哪一行,那么应该在上面的代码中添加安排。

  df1 %>%
  group_by(MedID) %>%
  mutate(Flag2 = ifelse(Flag == 99, 1L, NA)) %>%
  arrange(AdmitDate) %>%
    fill(Flag2) %>%
    mutate(Flag2 = ifelse(is.na(Flag2), 0L, Flag2))
 类似资料: