我需要帮助确定R中观察组中最长的连续值序列(=1)。
我有城镇月降雨量的数据。我需要确定每年月降雨量高于年平均值的最长时期(rain_above = 1)。如果每年有两个等长的时期,我想确定总降雨量最大的时期。
一些示例数据:
df1 <- data.frame(cbind(town=c("A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B"),
year=c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,
2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001),
month=c(1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12),
rain_above =c(0,0,0,1,1,1,1,1,0,0,0,0,
0,0,0,0,1,1,1,1,1,0,0,0,
0,1,1,1,1,0,0,0,1,1,0,0,
1,1,1,0,0,0,1,1,1,0,0,0),
rain = c(4.5,4,5,7.1,7.7,8,7.4,7.9,5.1,4.9,4.6,4.4,
4.4,4,4.8,5.1,7.2,7.4,7.4,7.1,7.6,5.4,5.1,5,
7.3,11.3,11.5,11.6,11.1,6.5,6.4,6.2,9.9,10.2,5.4,5.5,
10.4,10.9,11.4,7.8,7.3,7.2,9.8,9.9,10,7.2,6.9,6.6)))
在df,A镇在2000年的第4个月到第8个月之间有一个雨季。这是rain_above=1的唯一时期。
B镇在2001年有一个雨季,在第1个月和第3个月之间。尽管有两个长度相等的时期(3个月),但今年的第一个时期的总降雨量更大。
View(df)
df
town year month rain_above rain
1 A 2000 1 0 4.5
2 A 2000 2 0 4
3 A 2000 3 0 5
4 A 2000 4 1 7.1
5 A 2000 5 1 7.7
6 A 2000 6 1 8
7 A 2000 7 1 7.4
8 A 2000 8 1 7.9
9 A 2000 9 0 5.1
10 A 2000 10 0 4.9
11 A 2000 11 0 4.6
12 A 2000 12 0 4.4
13 A 2001 1 0 4.4
14 A 2001 2 0 4
15 A 2001 3 0 4.8
16 A 2001 4 0 5.1
17 A 2001 5 1 7.2
18 A 2001 6 1 7.4
19 A 2001 7 1 7.4
20 A 2001 8 1 7.1
21 A 2001 9 1 7.6
22 A 2001 10 0 5.4
23 A 2001 11 0 5.1
24 A 2001 12 0 5
25 B 2000 1 0 7.3
26 B 2000 2 1 11.3
27 B 2000 3 1 11.5
28 B 2000 4 1 11.6
29 B 2000 5 1 11.1
30 B 2000 6 0 6.5
31 B 2000 7 0 6.4
32 B 2000 8 0 6.2
33 B 2000 9 1 9.9
34 B 2000 10 1 10.2
35 B 2000 11 0 5.4
36 B 2000 12 0 5.5
37 B 2001 1 1 10.4
38 B 2001 2 1 10.9
39 B 2001 3 1 11.4
40 B 2001 4 0 7.8
41 B 2001 5 0 7.3
42 B 2001 6 0 7.2
43 B 2001 7 1 9.8
44 B 2001 8 1 9.9
45 B 2001 9 1 10
46 B 2001 10 0 7.2
47 B 2001 11 0 6.9
48 B 2001 12 0 6.6
我想为雨季生成一个指标变量,即在降雨总量高于平均水平的最长月份期间 =1,否则为 =0:
df1
town year month rain_above rain season
1 A 2000 1 0 4.5 0
2 A 2000 2 0 4 0
3 A 2000 3 0 5 0
4 A 2000 4 1 7.1 1
5 A 2000 5 1 7.7 1
6 A 2000 6 1 8 1
7 A 2000 7 1 7.4 1
8 A 2000 8 1 7.9 1
9 A 2000 9 0 5.1 0
10 A 2000 10 0 4.9 0
11 A 2000 11 0 4.6 0
12 A 2000 12 0 4.4 0
13 A 2001 1 0 4.4 0
14 A 2001 2 0 4 0
15 A 2001 3 0 4.8 0
16 A 2001 4 0 5.1 0
17 A 2001 5 1 7.2 1
18 A 2001 6 1 7.4 1
19 A 2001 7 1 7.4 1
20 A 2001 8 1 7.1 1
21 A 2001 9 1 7.6 1
22 A 2001 10 0 5.4 0
23 A 2001 11 0 5.1 0
24 A 2001 12 0 5 0
25 B 2000 1 0 7.3 0
26 B 2000 2 1 11.3 1
27 B 2000 3 1 11.5 1
28 B 2000 4 1 11.6 1
29 B 2000 5 1 11.1 1
30 B 2000 6 0 6.5 0
31 B 2000 7 0 6.4 0
32 B 2000 8 0 6.2 0
33 B 2000 9 1 9.9 0
34 B 2000 10 1 10.2 0
35 B 2000 11 0 5.4 0
36 B 2000 12 0 5.5 0
37 B 2001 1 1 10.4 1
38 B 2001 2 1 10.9 1
39 B 2001 3 1 11.4 1
40 B 2001 4 0 7.8 0
41 B 2001 5 0 7.3 0
42 B 2001 6 0 7.2 0
43 B 2001 7 1 9.8 0
44 B 2001 8 1 9.9 0
45 B 2001 9 1 10 0
46 B 2001 10 0 7.2 0
47 B 2001 11 0 6.9 0
48 B 2001 12 0 6.6 0
感谢任何帮助!
你可以从data.table
中尝试使用rleid
,如下所示
library(data.table)
setDT(df1)[
,
`:=`(sum_rain = sum(rain), grplen = .N),
.(town, year, rleid(rain_above))
][
, rain_season := +(sum_rain == max(sum_rain) & grplen == max(grplen)),
.(town, year)
][
,
grplen := NULL
][]
它给出了
town year month rain_above rain sum_rain rain_season
1: A 2000 1 0 4.5 13.5 0
2: A 2000 2 0 4.0 13.5 0
3: A 2000 3 0 5.0 13.5 0
4: A 2000 4 1 7.1 38.1 1
5: A 2000 5 1 7.7 38.1 1
6: A 2000 6 1 8.0 38.1 1
7: A 2000 7 1 7.4 38.1 1
8: A 2000 8 1 7.9 38.1 1
9: A 2000 9 0 5.1 19.0 0
10: A 2000 10 0 4.9 19.0 0
11: A 2000 11 0 4.6 19.0 0
12: A 2000 12 0 4.4 19.0 0
13: A 2001 1 0 4.4 18.3 0
14: A 2001 2 0 4.0 18.3 0
15: A 2001 3 0 4.8 18.3 0
16: A 2001 4 0 5.1 18.3 0
17: A 2001 5 1 7.2 36.7 1
18: A 2001 6 1 7.4 36.7 1
19: A 2001 7 1 7.4 36.7 1
20: A 2001 8 1 7.1 36.7 1
21: A 2001 9 1 7.6 36.7 1
22: A 2001 10 0 5.4 15.5 0
23: A 2001 11 0 5.1 15.5 0
24: A 2001 12 0 5.0 15.5 0
25: B 2000 1 0 7.3 7.3 0
26: B 2000 2 1 11.3 45.5 1
27: B 2000 3 1 11.5 45.5 1
28: B 2000 4 1 11.6 45.5 1
29: B 2000 5 1 11.1 45.5 1
30: B 2000 6 0 6.5 19.1 0
31: B 2000 7 0 6.4 19.1 0
32: B 2000 8 0 6.2 19.1 0
33: B 2000 9 1 9.9 20.1 0
34: B 2000 10 1 10.2 20.1 0
35: B 2000 11 0 5.4 10.9 0
36: B 2000 12 0 5.5 10.9 0
37: B 2001 1 1 10.4 32.7 1
38: B 2001 2 1 10.9 32.7 1
39: B 2001 3 1 11.4 32.7 1
40: B 2001 4 0 7.8 22.3 0
41: B 2001 5 0 7.3 22.3 0
42: B 2001 6 0 7.2 22.3 0
43: B 2001 7 1 9.8 29.7 0
44: B 2001 8 1 9.9 29.7 0
45: B 2001 9 1 10.0 29.7 0
46: B 2001 10 0 7.2 20.7 0
47: B 2001 11 0 6.9 20.7 0
48: B 2001 12 0 6.6 20.7 0
我找不到比这更好的了。使用< code>rle获取每个< code >城镇和< code >年的最长序列,然后使用< code>data.table::rleid和< code>sum检查哪个连续季节的降雨量最高:
library(dplyr)
df1 %>%
group_by(town, year) %>%
mutate(rle = with(rle(rain_above),
rep(+(values == 1 & lengths == max(lengths)), lengths))) %>%
group_by(gp = data.table::rleid(rle), .add = T) %>%
mutate(sum_rain = sum(rain)) %>%
ungroup(gp) %>%
mutate(rain_season = +(sum_rain == max(sum_rain[rle == 1])))
输出
# A tibble: 48 × 9
# Groups: town, year [4]
town year month rain_above rain rle gp sum_rain rain_season
<chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <int>
1 A 2000 1 0 4.5 0 1 13.5 0
2 A 2000 2 0 4 0 1 13.5 0
3 A 2000 3 0 5 0 1 13.5 0
4 A 2000 4 1 7.1 1 2 38.1 1
5 A 2000 5 1 7.7 1 2 38.1 1
6 A 2000 6 1 8 1 2 38.1 1
7 A 2000 7 1 7.4 1 2 38.1 1
8 A 2000 8 1 7.9 1 2 38.1 1
9 A 2000 9 0 5.1 0 3 19 0
10 A 2000 10 0 4.9 0 3 19 0
11 A 2000 11 0 4.6 0 3 19 0
12 A 2000 12 0 4.4 0 3 19 0
13 A 2001 1 0 4.4 0 3 18.3 0
14 A 2001 2 0 4 0 3 18.3 0
15 A 2001 3 0 4.8 0 3 18.3 0
16 A 2001 4 0 5.1 0 3 18.3 0
17 A 2001 5 1 7.2 1 4 36.7 1
18 A 2001 6 1 7.4 1 4 36.7 1
19 A 2001 7 1 7.4 1 4 36.7 1
20 A 2001 8 1 7.1 1 4 36.7 1
21 A 2001 9 1 7.6 1 4 36.7 1
22 A 2001 10 0 5.4 0 5 15.5 0
23 A 2001 11 0 5.1 0 5 15.5 0
24 A 2001 12 0 5 0 5 15.5 0
25 B 2000 1 0 7.3 0 5 7.3 0
26 B 2000 2 1 11.3 1 6 45.5 1
27 B 2000 3 1 11.5 1 6 45.5 1
28 B 2000 4 1 11.6 1 6 45.5 1
29 B 2000 5 1 11.1 1 6 45.5 1
30 B 2000 6 0 6.5 0 7 50.1 0
31 B 2000 7 0 6.4 0 7 50.1 0
32 B 2000 8 0 6.2 0 7 50.1 0
33 B 2000 9 1 9.9 0 7 50.1 0
34 B 2000 10 1 10.2 0 7 50.1 0
35 B 2000 11 0 5.4 0 7 50.1 0
36 B 2000 12 0 5.5 0 7 50.1 0
37 B 2001 1 1 10.4 1 8 32.7 1
38 B 2001 2 1 10.9 1 8 32.7 1
39 B 2001 3 1 11.4 1 8 32.7 1
40 B 2001 4 0 7.8 0 9 22.3 0
41 B 2001 5 0 7.3 0 9 22.3 0
42 B 2001 6 0 7.2 0 9 22.3 0
43 B 2001 7 1 9.8 1 10 29.7 0
44 B 2001 8 1 9.9 1 10 29.7 0
45 B 2001 9 1 10 1 10 29.7 0
46 B 2001 10 0 7.2 0 11 20.7 0
47 B 2001 11 0 6.9 0 11 20.7 0
48 B 2001 12 0 6.6 0 11 20.7 0
我正在进行温度的时空观测,存储在大小为100*100*504(100*100网格,代表21天的504个不同小时)的阵列中。我正在根据这些观察结果计算不同时期(3到21天)的各种指标,这显然需要一些时间,我正在考虑提高计算效率。我不太习惯R,所以我不确定我所做的是不是最有效的方法。 我想做的事情之一是找到(每个细胞)温度高于某个阈值的最长连续时间。这就是我此刻正在做的事情: < li >首先,我使用
给定包含以下文档的集合: 我需要返回ip值为X的所有文档,但前提是X的关联时间戳是ips数组中的最高时间戳(因此,上面的示例文档不应与搜索“222222222”匹配,因为这不是具有最新时间戳的ip)。 这是我第一次在MongoDB中做任何超出相当基本的东西,所以我能得到的最接近的是: 科尔。聚合({$匹配:{“ips.ip”:X}}},{$组:{“\u id”:“$主机”,“max”:{$max:
问题内容: 在字符串数组中找到最长的字符串有一种简便的方法吗? 像什么? 问题答案: var longest = arr.sort(function (a, b) { return b.length - a.length; })[0]; 可能更有效,但仅自Javascript 1.8 / ECMAScript5起可用,并且在较旧的浏览器中默认不可用:
问题内容: 我有在另一个主题上找到的这段代码,但是该代码按连续字符而不是字母顺序对子字符串进行排序。如何按字母顺序更正?它打印出来了,我想打印。谢谢 ps:我是python的初学者 问题答案: 尝试更改此: 对此: 这将显示您的示例输入字符串。代码更简单,因为您正试图解决一个更简单的问题:-)
问题内容: 是否有numpy-thonic方法(例如函数)在数组中查找最接近的值? 例: 问题答案:
我需要找到字符串中最长的序列,并警告序列必须重复三次或更多次。例如,如果我的字符串是: fdwaw4helloworld vcdv1c3xcv3xcz1sda21f2sd1ahelloworld gafgfa4564534321fadghelloworld 然后我希望返回值“helloworld”。 我知道有几种方法可以做到这一点,但我面临的问题是,实际的字符串太大了,所以我真的在寻找一种能够及时