当前位置: 首页 > 工具软件 > Tabler > 使用案例 >

R语言data.table详解

柯琛
2023-12-01

@R语言学习

data.table用法解析

R语言中的data.table是一种数据处理时的高效率工具。在了解data.table之前先简单说明一下管道函数的使用方法。

管道函数的使用方法

常用的管道函数有以下三种:
%in%:表示包含于
%>%:表示向右传递
%$%:表示向右传递列,并可以直接按列操作

data.table使用方法

1. 首先将数据变成data.table格式
data = data%>% 
    as.data.table()
2. 将数据按照以下5个变量汇总
data = data%>% 
    as.data.table()%$% 
    .[,.(value = sum(value,na.rm = TRUE)),
      by = c("period","index","area","id1","id2")]
3. 按照某类别列的变量值筛选
data = data%$%
    .[index%in%c("UV","DB","XS"),]
4.改变类别列中变量名称
data = data%$%
    .[index_name == "DB",index_name := "DB_PV"]%$%
    .[index_name == "GZ",index_name := "GZ_UV"]%$%
    .[index_name == "XSLZ",index_name := "XSLZ_UV"]
5. 取某列全部值的前6个字母
data = data %$%
    .[,":="(id1 = substr(id1,1,6),
            id2 = substr(id2,1,6))]%$%
    .[,.(value = sum(value,na.rm = TRUE)),
      by = c("period","index","area","id1","id2")]
6. 筛选列
data = data %$%
    .[,.(period,id1,id2,area,index,value)]
7. 筛选行列
N <- rank.P[ID2%in%target,.(ID2, Month, prov_ID, Province, Index_name,Value_adj_per)]
8. 变更列名称
data = data%>%
    setnames(c("period","car_id1","car_id2","area","value"), c("Month","ID1","ID2","Province","Value")) 
9. 按照某列合并表格,并筛选某些列
data = data %>%
    merge(data2, by.x="Province",by.y="Province",all.x = TRUE)%$%
    .[,.(Month,Province,ID,index_name,ID2,Value)]
9.1 合并后出现列名重复现象
 new_data <- data%>%
    merge(data2,
          by.x = c("ID1","ID2","Month","prov_ID","Province","Index_name"),
          by.y = c("ID2","ID1","Month","prov_ID","Province","Index_name"),
          suffixes = c("_P","_N"))
10. 定义新列
new_data <- new_data[, ":="(Score = (Value_adj_per_P+Value_adj_per_N)/2)]%$%
    .[, Rank := row_number(-Score),
      by = c("ID2","Month","prov_ID","Province","Index_name")]%>%
    merge(phase, by = "Index_name")%$%
    .[, Index_name := NULL]

11.复制某列某类别并重命名为另外一个类别

P.N<- copy(P.N)%$%
    .[Phase == "留资",]%$%
    .[Phase == "留资",Phase := "最终排行"]
12.筛选某列中每个类别的前20名
 P <- rank.P[ID2%in%target,
              .SD[1:20],
              by = c("ID2","Month","prov_ID","Province","Index_name")]%>%
    na.omit()%$%
    .[,.(ID2, Month, prov_ID, Province, Index_name,Value_adj)]
 类似资料: