我有数据。下表
structure(list(group = c("A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B"),
V1 = c(6.38, 6.38, 6.38, 6.38, -1.53, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93, -24.93,
-24.93, -24.93, -24.93, -6.8, -6.8, -6.8, -6.8, -6.8, -1.71,
-1.71, -1.71, -1.71, -1.71, -1.06, -1.06, -1.06, -1.06, -1.06,
-1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06,
-1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06,
-1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06, -1.06,
-1.06, -1.06, -1.06, -1.06, -1.06, -1.06, 8.42, 8.42, 8.42, 4.34,
4.34, 4.34, 4.34, 4.34, 4.34, 4.34, 4.34, 4.34, 4.34, 4.34, 4.34,
4.34, 4.34, 4.34, 4.34, 4.34, 4.34),
V2 = c(0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, -0.11, -0.11, -11, -11, -11, -11, -11,
-11, -11, -11, -11, -11, 1.6, 1.6, 1.6, 1.6, -0.55, -0.55, -0.55,
-2.15, -2.15, -2.15, -2.15, -2.15, -2.15, -2.15, -2.15, -2.15,
-0.19, -0.19, -0.19, -0.19, -0.19, 2.63, 2.63, 2.63, 2.63, 2.63,
2.63, 2.63, 2.63, 2.63, 2.63, 2.63, 2.63, 2.63, 2.63, 2.63, 2.63,
-3.86, -3.86, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, -1.38,
-1.38, -1.38, -1.38, -5.15, -11.58, -11.58, -11.58, -11.58, -11.58,
-11.58, -11.58, -11.58, -11.58, -0.46, -7.32, -7.32, -7.32, -7.32,
-7.32, -7.32, -7.32, -7.32, 2.67, 4.88, 4.88, 4.88, 4.88, 4.88,
4.88, 4.88, 4.88, -11.57, -11.57, -11.57, 1.67, 1.55, 1.55, 2.3,
2.3, 2.3, 2.3, 2.3, 2.3, 2.3, 2.3, -1.42, 21.88, 21.88, 21.88,
21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21.88,
21.88, 21.88, 21.88, 21.88, 21.88, 21.88, -0.59, -0.59, -0.59,
-0.59, -0.59, -1.87, -1.87, -1.87)), row.names = c(NA, -138L),
class = c("data.table", "data.frame"))
我想按组查找每列中唯一值的总和。
我尝试了以下内容,它给了我每列中所有值的总和(但不是唯一值)。
library(data.table)
dt[, lapply(.SD, sum, na.rm = T), by=group, .SDcols = c("V1", "V2")]
group V1 V2
1: A -1571.53 -88.67
2: B 20.55 245.64
然而,我只想找到唯一值的和。
答案应该是这样的-
group V1 V2
1: A -269.38 -12.43
2: B -4.47 27.17
谢谢
唯一值的总和(独立于每一列)可以这样计算
dt[, .(sum(unique(V1)), sum(unique(V2))), group]
#> group V1 V2
#> 1: A -20.08 -13.83
#> 2: B 3.19 -5.01
您给出的答案是每列的总和,其中每一行都是唯一的,即
unique(dt)[, .(sum(V1), sum(V2)), group]
#> group V1 V2
#> 1: A -269.38 -12.43
#> 2: B -4.47 27.17
对于列的命名列表
unique(dt)[, lapply(.SD, sum), group, .SDcols = c('V1', 'V2')]
或者,如果它是您想要的唯一性的第一个版本
dt[, lapply(.SD, function(x) sum(unique(x))), group, .SDcols = c('V1', 'V2')]
我有一个data.table,我需要生成另一个data.table,它只列出每列的唯一值。一个例子: 从 到 实现这一点最有效的方法是什么?
我使用数据表来存储数据。我试图弄清楚每行中的某些列是否是唯一的。我想在data.table中添加一列,如果有重复值,该列将保存值“重复值”,如果没有重复值,该列将为NA。我要检查重复的列名存储在一个字符向量中。例如,我创建了我的数据表: 我还有另一个变量,指示需要检查哪些列是否重复。重要的是,我能够将列名存储在字符向量中,而不需要“知道”它们(因为它们将作为参数传递给函数)。 我希望输出是: 如果
我正在编写一个非常简单的函数来汇总 data.tables 的列。我一次向函数传递一列,然后执行一些诊断以找出汇总选项,然后进行汇总。我在 data.table 中执行此操作,以允许一些非常大的数据集。 所以,我使用在列中传递以进行总结,并使用在data.table表达式的部分中。因为我一次传递一列,所以我没有使用lapplication。我发现有些函数有效,而另一些无效。下面是我正在使用的测试数
问题内容: 我无法获得熊猫列的平均值或均值。有一个数据框。我在下面尝试的任何事情都没有给我该列的平均值 以下返回几个值,而不是一个: 这样: 问题答案: 如果您只想要列的均值,请选择列(这是一个系列),然后调用:
我不能得到熊猫的平均值或平均值。有一个数据框。下面我尝试的东西都没有给我列的平均值 以下内容返回多个值,而不是一个值: 这也是:
问题内容: 我正在尝试学习SQL,所以请耐心等待。我正在使用PostgreSQL 9.3 我想根据日期窗口对一列进行平均。我能够编写窗口函数来完成一个集合,但是我希望能够随着不断增长做到这一点。我的意思是: 我假设有一个比对我要平均的每个范围运行查询更好的方法。任何建议表示赞赏。谢谢你。 编辑 我正在尝试创建均匀分布的垃圾箱,以用于汇总表的值。 我的间隔是: 这里是一个表的列 并且 是并列我想表分