当前位置: 首页 > 知识库问答 >
问题:

合并满足条件的唯一group_by行并使用整洁的行进行汇总

邓业
2023-03-14

我希望使用tidyverse工具来完成以下任务:

在下面的数据帧中,对于(“yr”,“bin”,“range”)的每个唯一行,我想通过执行以下操作将“class”具有值“B182E”或“B182G”的行合并为一行:保持“yr”,“bin”,“范围”和“英里”的值不变,创建一个新的变量“class2”,其值变为“B182E_B182G”,否则如果不是“B182E”或“B182G”,它将保留“类”的值, 而对于变量“new_count”和“old_count”,通过对它们各自的行求和来更新它们的值,例如,“new_count”成为第5行中的“new_count”加上第6行的“new_count”之和,即20 12 = 32,而“old_count”成为第5行中的“old_count”之和加上第6行中的“old_count”,即30 14 = 44。

类似地,将“class”的值为“K760”或“K758”的行合并成一行,并以与上面类似的方式将新创建的“class2”的值更新为“K760_K758”。

与上面要合并的“class”值不匹配的所有行保持不变。

例如,以第 5 行为例

我怎么能这样做呢?

df <- structure(list(yr = c(2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006), bin = c("2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "2"), range = c("40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49", 
"40-49", "40-49", "40-49", "40-49", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59", 
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59"
), miles = c(528496, 528496, 528496, 528496, 528496, 528496, 
528496, 528496, 528496, 528496, 528496, 528496, 528496, 528496, 
528496, 528496, 528496, 604761, 604761, 604761, 604761, 604761, 
604761, 604761, 604761, 604761, 604761, 604761, 604761, 604761, 
604761, 604761, 604761, 604761, 547232, 547232, 547232, 547232, 
547232, 547232, 547232, 547232, 547232, 547232, 547232, 547232, 
547232, 547232, 547232, 547232, 547232, 613025, 613025, 613025, 
613025, 613025, 613025, 613025, 613025, 613025, 613025, 613025, 
613025, 613025, 613025, 613025, 613025, 613025, 557216.5, 557216.5, 
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 
557216.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 
604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 
604720.5, 604720.5, 604720.5, 604720.5, 537808.5, 537808.5, 537808.5, 
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 
597005.5, 597005.5, 597005.5), class = c("B181", "B181E", "B181G", 
"B182", "B182E", "B182G", "C220", "C221", "C240", "K70", "K703", 
"K743", "K746", "K754", "K758", "K760", "K769", "B181", "B181E", 
"B181G", "B182", "B182E", "B182G", "C220", "C221", "C240", "K70", 
"K703", "K743", "K746", "K754", "K758", "K760", "K769", "B181", 
"B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221", "C240", 
"K70", "K703", "K743", "K746", "K754", "K758", "K760", "K769", 
"B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221", 
"C240", "K70", "K703", "K743", "K746", "K754", "K758", "K760", 
"K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220", 
"C221", "C240", "K70", "K703", "K743", "K746", "K754", "K758", 
"K760", "K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G", 
"C220", "C221", "C240", "K70", "K703", "K743", "K746", "K754", 
"K758", "K760", "K769", "B181", "B181E", "B181G", "B182", "B182E", 
"B182G", "C220", "C221", "C240", "K70", "K703", "K743", "K746", 
"K754", "K758", "K760", "K769", "B181", "B181E", "B181G", "B182", 
"B182E", "B182G", "C220", "C221", "C240", "K70", "K703", "K743", 
"K746", "K754", "K758", "K760", "K769"), new_count = c(84, 24, 
16, 226, 20, 12, NA, NA, 5, 60, 24, 24, 27, 16, NA, 22, 78, 58, 
1, 4, 168, 5, 22, 20, 18, 15, 122, 72, 37, 75, 26, 0, 53, 149, 
117, 12, 45, 567, NA, 29, 12, 7, NA, 149, 77, NA, 54, NA, NA, 
44, 99, 124, 0, 8, 453, NA, 55, 54, 9, 10, 307, 167, 5, 120, 
9, NA, 59, 115, 137, NA, 7, 481, NA, 29, 13, 6, NA, 134, 65, 
NA, 62, 7, NA, 57, 93, 135, 23, 7, 444, 5, 79, 61, 18, 13, 323, 
190, 5, 135, 12, 0, 57, 117, 71, 0, NA, 204, NA, 7, 6, NA, NA, 
52, 22, 13, 24, 17, NA, 23, 86, 47, NA, NA, 172, NA, 24, 16, 
9, 15, 128, 87, 33, 77, 31, NA, 65, 158), old_count = c(211, 
52, 37, 1095, 30, 14, 8, 6, 8, 111, 52, 82, 53, 37, 8, 27, 102, 
120, 3, 4, 865, 5, 32, 30, 23, 16, 282, 168, 230, 149, 50, NA, 
71, 214, 349, 54, 67, 2102, NA, 39, 15, 9, 8, 263, 147, 6, 102, 
11, NA, 65, 123, 293, NA, 9, 1785, 1, 73, 70, 15, 15, 621, 369, 
27, 222, 20, NA, 81, 155, 407, NA, 9, 2065, 5, 46, 19, 8, NA, 
263, 143, 7, 113, 17, NA, 76, 126, 335, NA, 11, 2067, 5, 110, 
89, 21, 19, 632, 383, 24, 225, 21, NA, 79, 151, 234, 0, 0, 1061, 
NA, 9, 11, 5, 5, 106, 52, 85, 52, 41, 6, 31, 118, 133, NA, NA, 
957, NA, 40, 29, 14, 20, 282, 180, 224, 137, 59, 6, 81, 212)), row.names = c(NA, 
-136L), class = "data.frame") 

共有1个答案

龚安民
2023-03-14

您本质上希望通过yrbinrangeclass 2使您的行独一无二。首先创建class 2,然后您可以轻松地分组和总结:

df %>%
  mutate(class2 = case_when(
    class %in% c("B181E", "B181G") ~ "B181E_B181G",
    class %in% c("K760", "K758") ~ "K760_K758",
    TRUE ~ class)
  ) %>%
  group_by(yr, bin, range, class2) %>%
  summarise(
    new_count = sum(new_count, na.rm = TRUE),
    old_count = sum(old_count, na.rm = TRUE)
  )
#> # A tibble: 120 x 6
#> # Groups:   yr, bin, range [8]
#> yr bin   range class2      new_count old_count
#> <dbl> <chr> <chr> <chr>           <dbl>     <dbl>
#> 1  2005 1     40-49 B181              117       349
#> 2  2005 1     40-49 B181E_B181G        57       121
#> 3  2005 1     40-49 B182              567      2102
#> 4  2005 1     40-49 B182E               0         0
#> 5  2005 1     40-49 B182G              29        39
#> 6  2005 1     40-49 C220               12        15
#> 7  2005 1     40-49 C221                7         9
#> 8  2005 1     40-49 C240                0         8
#> 9  2005 1     40-49 K70               149       263
#> 10  2005 1     40-49 K703               77       147
#> # … with 110 more rows
 类似资料:
  • 问题内容: 我有一个非常简单的问题:是否可以根据满足的条件对检索到的行进行排序?例如,我有一个人桌,我想检索所有名称以“ I”开头,以“ ster”结尾或包含“ lo”的人,并根据满足这些条件的顺序对其进行排序。首先是与第一个条件匹配的行,然后是与第二个条件匹配的行,依此类推。( 不重复: 如果某行满足第一个条件,则在第二个条件下不应再次显示该行) 编辑: 我使用Visual C#,并且使用MS

  • 我有一个有许多记录的表。它有以下结构: 表(col1、col2、col3、col4、col5、col6): 还有很多很多其他的行。 因此,我想要一张桌子: 表(col1,col2,col3,col4_1,col4_2,col4_3,col5_1,col5_2,col5_3,col6_1,col6_2,col6_3): 换句话说:表中的某些部分具有相同的,但不同的。每个相同的行数在1-3之间变化(事

  • 问题内容: 我有以下DataFrame: 如您所见,列用作索引。我想获取该行的序数,在这种情况下应该是。 被测试的列可以是索引列(在这种情况下也可以)或常规列,例如,我可能想找到满足条件的行的索引。 问题答案: 您可以这样使用np.where: 返回的值是一个数组,因为一列中可能有多个具有特定索引或值的行。

  • 问题内容: 我想要一种改进我的sql代码的好方法,当条件满足时,我必须使用内部联接。我目前正在复制代码: 我想以这种方式做到这一点: 编辑: 解决方案(由于@Damien_The_Unbeliever): 问题答案: 这应该(大约)执行相同的操作: 当然,这还意味着必须编写对其中的列的任何其他引用,以期望此类列为。

  • 我想对MySQL表中的行求和,然后合并和更新。我正试图在一天内把所有的重复付款加起来。我将发布我现有的查询,但有一些限制。 示例: 预期结果: 我试过这个循环。 问题1.我一次只能更新一个专栏。 问题2。我以编程方式遍历行并更新值,因为如果没有重复的值,则行不会得到更新。我必须写另一个查询来更新。也可以删除现有查询。 问题3。它只适用于两次付款。 在laravel雄辩合并(和)多行和更新中是否有任

  • 我有一个这样的数据框 我想把这些数据子集,得到美国、中国和印度这些年的国内生产总值。另一个问题是,假设我每年有200个国家的国内生产总值数据,而我只对50个国家感兴趣。如何子集数据?非常感谢!