我希望使用tidyverse工具来完成以下任务:
在下面的数据帧中,对于(“yr”,“bin”,“range”)的每个唯一行,我想通过执行以下操作将“class”具有值“B182E”或“B182G”的行合并为一行:保持“yr”,“bin”,“范围”和“英里”的值不变,创建一个新的变量“class2”,其值变为“B182E_B182G”,否则如果不是“B182E”或“B182G”,它将保留“类”的值, 而对于变量“new_count”和“old_count”,通过对它们各自的行求和来更新它们的值,例如,“new_count”成为第5行中的“new_count”加上第6行的“new_count”之和,即20 12 = 32,而“old_count”成为第5行中的“old_count”之和加上第6行中的“old_count”,即30 14 = 44。
类似地,将“class”的值为“K760”或“K758”的行合并成一行,并以与上面类似的方式将新创建的“class2”的值更新为“K760_K758”。
与上面要合并的“class”值不匹配的所有行保持不变。
例如,以第 5 行为例
我怎么能这样做呢?
df <- structure(list(yr = c(2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006), bin = c("2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2"), range = c("40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59"
), miles = c(528496, 528496, 528496, 528496, 528496, 528496,
528496, 528496, 528496, 528496, 528496, 528496, 528496, 528496,
528496, 528496, 528496, 604761, 604761, 604761, 604761, 604761,
604761, 604761, 604761, 604761, 604761, 604761, 604761, 604761,
604761, 604761, 604761, 604761, 547232, 547232, 547232, 547232,
547232, 547232, 547232, 547232, 547232, 547232, 547232, 547232,
547232, 547232, 547232, 547232, 547232, 613025, 613025, 613025,
613025, 613025, 613025, 613025, 613025, 613025, 613025, 613025,
613025, 613025, 613025, 613025, 613025, 613025, 557216.5, 557216.5,
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5,
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5,
557216.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5,
604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5,
604720.5, 604720.5, 604720.5, 604720.5, 537808.5, 537808.5, 537808.5,
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5,
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5,
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5,
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5,
597005.5, 597005.5, 597005.5), class = c("B181", "B181E", "B181G",
"B182", "B182E", "B182G", "C220", "C221", "C240", "K70", "K703",
"K743", "K746", "K754", "K758", "K760", "K769", "B181", "B181E",
"B181G", "B182", "B182E", "B182G", "C220", "C221", "C240", "K70",
"K703", "K743", "K746", "K754", "K758", "K760", "K769", "B181",
"B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221", "C240",
"K70", "K703", "K743", "K746", "K754", "K758", "K760", "K769",
"B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221",
"C240", "K70", "K703", "K743", "K746", "K754", "K758", "K760",
"K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220",
"C221", "C240", "K70", "K703", "K743", "K746", "K754", "K758",
"K760", "K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G",
"C220", "C221", "C240", "K70", "K703", "K743", "K746", "K754",
"K758", "K760", "K769", "B181", "B181E", "B181G", "B182", "B182E",
"B182G", "C220", "C221", "C240", "K70", "K703", "K743", "K746",
"K754", "K758", "K760", "K769", "B181", "B181E", "B181G", "B182",
"B182E", "B182G", "C220", "C221", "C240", "K70", "K703", "K743",
"K746", "K754", "K758", "K760", "K769"), new_count = c(84, 24,
16, 226, 20, 12, NA, NA, 5, 60, 24, 24, 27, 16, NA, 22, 78, 58,
1, 4, 168, 5, 22, 20, 18, 15, 122, 72, 37, 75, 26, 0, 53, 149,
117, 12, 45, 567, NA, 29, 12, 7, NA, 149, 77, NA, 54, NA, NA,
44, 99, 124, 0, 8, 453, NA, 55, 54, 9, 10, 307, 167, 5, 120,
9, NA, 59, 115, 137, NA, 7, 481, NA, 29, 13, 6, NA, 134, 65,
NA, 62, 7, NA, 57, 93, 135, 23, 7, 444, 5, 79, 61, 18, 13, 323,
190, 5, 135, 12, 0, 57, 117, 71, 0, NA, 204, NA, 7, 6, NA, NA,
52, 22, 13, 24, 17, NA, 23, 86, 47, NA, NA, 172, NA, 24, 16,
9, 15, 128, 87, 33, 77, 31, NA, 65, 158), old_count = c(211,
52, 37, 1095, 30, 14, 8, 6, 8, 111, 52, 82, 53, 37, 8, 27, 102,
120, 3, 4, 865, 5, 32, 30, 23, 16, 282, 168, 230, 149, 50, NA,
71, 214, 349, 54, 67, 2102, NA, 39, 15, 9, 8, 263, 147, 6, 102,
11, NA, 65, 123, 293, NA, 9, 1785, 1, 73, 70, 15, 15, 621, 369,
27, 222, 20, NA, 81, 155, 407, NA, 9, 2065, 5, 46, 19, 8, NA,
263, 143, 7, 113, 17, NA, 76, 126, 335, NA, 11, 2067, 5, 110,
89, 21, 19, 632, 383, 24, 225, 21, NA, 79, 151, 234, 0, 0, 1061,
NA, 9, 11, 5, 5, 106, 52, 85, 52, 41, 6, 31, 118, 133, NA, NA,
957, NA, 40, 29, 14, 20, 282, 180, 224, 137, 59, 6, 81, 212)), row.names = c(NA,
-136L), class = "data.frame")
您本质上希望通过yr
、bin
、range
、class 2
使您的行独一无二。首先创建class 2
,然后您可以轻松地分组和总结:
df %>%
mutate(class2 = case_when(
class %in% c("B181E", "B181G") ~ "B181E_B181G",
class %in% c("K760", "K758") ~ "K760_K758",
TRUE ~ class)
) %>%
group_by(yr, bin, range, class2) %>%
summarise(
new_count = sum(new_count, na.rm = TRUE),
old_count = sum(old_count, na.rm = TRUE)
)
#> # A tibble: 120 x 6
#> # Groups: yr, bin, range [8]
#> yr bin range class2 new_count old_count
#> <dbl> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2005 1 40-49 B181 117 349
#> 2 2005 1 40-49 B181E_B181G 57 121
#> 3 2005 1 40-49 B182 567 2102
#> 4 2005 1 40-49 B182E 0 0
#> 5 2005 1 40-49 B182G 29 39
#> 6 2005 1 40-49 C220 12 15
#> 7 2005 1 40-49 C221 7 9
#> 8 2005 1 40-49 C240 0 8
#> 9 2005 1 40-49 K70 149 263
#> 10 2005 1 40-49 K703 77 147
#> # … with 110 more rows
问题内容: 我有一个非常简单的问题:是否可以根据满足的条件对检索到的行进行排序?例如,我有一个人桌,我想检索所有名称以“ I”开头,以“ ster”结尾或包含“ lo”的人,并根据满足这些条件的顺序对其进行排序。首先是与第一个条件匹配的行,然后是与第二个条件匹配的行,依此类推。( 不重复: 如果某行满足第一个条件,则在第二个条件下不应再次显示该行) 编辑: 我使用Visual C#,并且使用MS
我有一个有许多记录的表。它有以下结构: 表(col1、col2、col3、col4、col5、col6): 还有很多很多其他的行。 因此,我想要一张桌子: 表(col1,col2,col3,col4_1,col4_2,col4_3,col5_1,col5_2,col5_3,col6_1,col6_2,col6_3): 换句话说:表中的某些部分具有相同的,但不同的。每个相同的行数在1-3之间变化(事
问题内容: 我有以下DataFrame: 如您所见,列用作索引。我想获取该行的序数,在这种情况下应该是。 被测试的列可以是索引列(在这种情况下也可以)或常规列,例如,我可能想找到满足条件的行的索引。 问题答案: 您可以这样使用np.where: 返回的值是一个数组,因为一列中可能有多个具有特定索引或值的行。
问题内容: 我想要一种改进我的sql代码的好方法,当条件满足时,我必须使用内部联接。我目前正在复制代码: 我想以这种方式做到这一点: 编辑: 解决方案(由于@Damien_The_Unbeliever): 问题答案: 这应该(大约)执行相同的操作: 当然,这还意味着必须编写对其中的列的任何其他引用,以期望此类列为。
我想对MySQL表中的行求和,然后合并和更新。我正试图在一天内把所有的重复付款加起来。我将发布我现有的查询,但有一些限制。 示例: 预期结果: 我试过这个循环。 问题1.我一次只能更新一个专栏。 问题2。我以编程方式遍历行并更新值,因为如果没有重复的值,则行不会得到更新。我必须写另一个查询来更新。也可以删除现有查询。 问题3。它只适用于两次付款。 在laravel雄辩合并(和)多行和更新中是否有任
我有一个这样的数据框 我想把这些数据子集,得到美国、中国和印度这些年的国内生产总值。另一个问题是,假设我每年有200个国家的国内生产总值数据,而我只对50个国家感兴趣。如何子集数据?非常感谢!