问题：

dplyr中的mutate_eacht/summarise_each：如何选择某些列并为变异列命名？

傅鸿波

2023-03-14

我对dplyr动词mutate_each有点困惑。

使用基本的mutate将一列数据转换为z-scores，并在data.frame中创建一个新列（这里的名称为z_score_data)非常简单：

newDF <- DF %>%
  select(one_column) %>%
  mutate(z_score_data = one_column - (mean(one_column) / sd(one_column))

但是，由于我有许多数据列需要转换，所以我可能应该使用mutate_eacht谓词。

newDF <- DF %>%
     mutate_each(funs(scale))

null

谢谢你的帮助。

涂选

2023-03-14

在dplyr开发版本0.4.3.9000（在撰写本文时）中，mutate_each和summarise_each中的命名已经简化，如新闻中所述：

对summarise_each()和mutate_each()的命名行为进行了调整，以便可以强制包含函数和变量名:summarise_each(mtcars,funs（mean=mean）,everything())

如果您想在mutate_each/summarise_each中只应用一个函数，并且想给这些列指定新的名称，这一点非常重要。

为了显示不同之处，下面是使用新命名功能的dplyr 0.4.3.9000的输出，与下面的选项A.2形成对比：

library(dplyr) # >= 0.4.3.9000
iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mysum Sepal.Width_mysum
#1          5.1         3.5          1.4         0.2  setosa              876.5             458.6
#2          4.9         3.0          1.4         0.2  setosa              876.5             458.6
#3          4.7         3.2          1.3         0.2  setosa              876.5             458.6
#4          4.6         3.1          1.5         0.2  setosa              876.5             458.6
#5          5.0         3.6          1.4         0.2  setosa              876.5             458.6
#6          5.4         3.9          1.7         0.4  setosa              876.5             458.6
#  Petal.Length_mysum Petal.Width_mysum
#1              563.7             179.9
#2              563.7             179.9
#3              563.7             179.9
#4              563.7             179.9
#5              563.7             179.9
#6              563.7             179.9

如果您不提供新的名称，并且只提供1个函数，那么dplyr将更改现有的列（就像它在以前的版本中所做的那样）：

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1        876.5       458.6        563.7       179.9  setosa
#2        876.5       458.6        563.7       179.9  setosa
#3        876.5       458.6        563.7       179.9  setosa
#4        876.5       458.6        563.7       179.9  setosa
#5        876.5       458.6        563.7       179.9  setosa
#6        876.5       458.6        563.7       179.9  setosa

我假设这个新功能将在下一个版本0.4.4中通过CRAN提供。

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

iris %>% mutate_each(funs(sum), SLsum = Sepal.Length,SWsum = Sepal.Width,  -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species SLsum SWsum
#1          5.1         3.5          1.4         0.2  setosa   876   459
#2          4.9         3.0          1.4         0.2  setosa   876   459
#3          4.7         3.2          1.3         0.2  setosa   876   459
#4          4.6         3.1          1.5         0.2  setosa   876   459
#5          5.0         3.6          1.4         0.2  setosa   876   459
#6          5.4         3.9          1.7         0.4  setosa   876   459

与选项A.1、A.2和A.3不同，dplyr将保持现有列不变，并在此方法中创建新列。新列的名称等于您预先创建的命名向量的名称（在本例中是vars)。

vars <- names(iris)[1:2]  # choose which columns should be mutated
vars <- setNames(vars, paste0(vars, "_sum")) # create new column names
iris %>% mutate_each_(funs(sum), vars) %>% head 
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          5.1         3.5          1.4         0.2  setosa            876.5           458.6
#2          4.9         3.0          1.4         0.2  setosa            876.5           458.6
#3          4.7         3.2          1.3         0.2  setosa            876.5           458.6
#4          4.6         3.1          1.5         0.2  setosa            876.5           458.6
#5          5.0         3.6          1.4         0.2  setosa            876.5           458.6
#6          5.4         3.9          1.7         0.4  setosa            876.5           458.6

情况2：移除原有立柱

如您所见，此方法保持现有列不变，并添加具有指定名称的新列。如果您不想保留原始列，而只保留新创建的列（以及其他列），则可以在之后添加select语句：

iris %>% mutate_each_(funs(sum), vars) %>% select(-one_of(vars)) %>% head
#  Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          1.4         0.2  setosa            876.5           458.6
#2          1.4         0.2  setosa            876.5           458.6
#3          1.3         0.2  setosa            876.5           458.6
#4          1.5         0.2  setosa            876.5           458.6
#5          1.4         0.2  setosa            876.5           458.6
#6          1.7         0.4  setosa            876.5           458.6

iris %>% mutate_each(funs(sum, mean), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum Petal.Length_sum
#1          5.1         3.5          1.4         0.2  setosa              876             459              564
#2          4.9         3.0          1.4         0.2  setosa              876             459              564
#3          4.7         3.2          1.3         0.2  setosa              876             459              564
#4          4.6         3.1          1.5         0.2  setosa              876             459              564
#5          5.0         3.6          1.4         0.2  setosa              876             459              564
#6          5.4         3.9          1.7         0.4  setosa              876             459              564
#  Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
#1             180              5.84             3.06              3.76              1.2
#2             180              5.84             3.06              3.76              1.2
#3             180              5.84             3.06              3.76              1.2
#4             180              5.84             3.06              3.76              1.2
#5             180              5.84             3.06              3.76              1.2
#6             180              5.84             3.06              3.76              1.2

iris %>% mutate_each(funs(MySum = sum(.), MyMean = mean(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum
#1          5.1         3.5          1.4         0.2  setosa                876               459                564
#2          4.9         3.0          1.4         0.2  setosa                876               459                564
#3          4.7         3.2          1.3         0.2  setosa                876               459                564
#4          4.6         3.1          1.5         0.2  setosa                876               459                564
#5          5.0         3.6          1.4         0.2  setosa                876               459                564
#6          5.4         3.9          1.7         0.4  setosa                876               459                564
#  Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean
#1               180                5.84               3.06                3.76                1.2
#2               180                5.84               3.06                3.76                1.2
#3               180                5.84               3.06                3.76                1.2
#4               180                5.84               3.06                3.76                1.2
#5               180                5.84               3.06                3.76                1.2
#6               180                5.84               3.06                3.76                1.2

您可以引用要突变（或省略）的列，给出它们的名称，如下面所示（突变sepal.length,但不是物种）：

iris %>% mutate_each(funs(sum), Sepal.Length, -Species) %>% head()

此外，您还可以使用特殊函数来选择要变异的列、以某个单词开头或包含某个单词的所有列等，例如：

iris %>% mutate_each(funs(sum), contains("Sepal"),  -Species) %>% head()

有关这些函数的更多信息，请参见？mutate_each和？select。

x <- c("Sepal.Width", "Sepal.Length") # vector of column names 
iris %>% mutate_each_(funs(sum), x) %>% head()

dplyr中的mutate_eacht/summarise_each：如何选择某些列并为变异列命名？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档