当前位置: 首页 > 知识库问答 >
问题:

R{dplyr}:`rename`或`mutate`data.frames在`rowwise`list中-LHS上具有不同列名的列

吴凯
2023-03-14

我正在使用{dplyr}1.1.0中的data.frameslist-columns,我想知道当嵌套的data.frame被按行方式分组时,是否可以在不离开管道的情况下对每个data.frame中的rename()mutate()列进行重命名()mutate()

为什么我想知道/这么做?根据我对{dplyr}1.1.0的理解,它推荐rowwise(),而不是在列上使用{purrr}的map-family。下面我首先展示了我在{dplyr}1.1.0之前所做的工作,然后展示了几个针对{dplyr}1.1.0的示例(其中大多数不起作用)。

虽然{rlang}支持左侧的粘附字符串(LHS),这可以在编写{dplyr}自定义函数时使用,但{dplyr}函数在rowwisetibble中的LHS似乎还不受支持(至少我下面的示例不起作用)。

对于rename,我找到了一种使用rename_with()的方法,但我不知道如何使用mutate使其工作。

我也不理解我得到的大多数错误消息。他们或多或少地说,在:=之前,我没有在LHS上使用字符串,但在rowwise模式下,我引用的列(new)实际上是length==1的字符向量。

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(purrr)

myiris <- iris %>% 
  nest_by(Species, .key = "mydat") %>% 
  ungroup %>% 
  mutate(new = letters[1:3])

# our data looks like this
# we want to use the strings in column `new` on the LHS of `rename` and `mutate`
myiris
#> # A tibble: 3 x 3
#>   Species                 mydat new  
#>   <fct>      <list<tbl_df[,4]>> <chr>
#> 1 setosa               [50 x 4] a    
#> 2 versicolor           [50 x 4] b    
#> 3 virginica            [50 x 4] c

# For reference: under dplyr < 1.0 I did the following:

# rename in pipe
# working
myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ rename_at(.x, "Sepal.Length", function(z) paste(.y)))) %>% 
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>       a Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   5.1         3.5          1.4         0.2
#> 2   4.9         3            1.4         0.2
#> 3   4.7         3.2          1.3         0.2
#> 4   4.6         3.1          1.5         0.2
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>       b Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   7           3.2          4.7         1.4
#> 2   6.4         3.2          4.5         1.5
#> 3   6.9         3.1          4.9         1.5
#> 4   5.5         2.3          4           1.3
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>       c Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   6.3         3.3          6           2.5
#> 2   5.8         2.7          5.1         1.9
#> 3   7.1         3            5.9         2.1
#> 4   6.3         2.9          5.6         1.8
#> # ... with 46 more rows

# mutate in pipe
# was never working even under dplyr < 1.0.0
myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ mutate(.x, eval(.y) := .y))) %>% 
  pull(mydat)
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `map2(mydat, new, ~mutate(.x, `:=`(eval(.y), .y)))`.

# mutate with custom function
# working
mymutate <- function(df, y) {
  mutate(df, !! y := y)
}

myiris %>% 
  mutate(mydat = map2(mydat, new,
                      ~ mymutate(.x, .y))) %>% 
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width a    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          5.1         3.5          1.4         0.2 a    
#> 2          4.9         3            1.4         0.2 a    
#> 3          4.7         3.2          1.3         0.2 a    
#> 4          4.6         3.1          1.5         0.2 a    
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width b    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          7           3.2          4.7         1.4 b    
#> 2          6.4         3.2          4.5         1.5 b    
#> 3          6.9         3.1          4.9         1.5 b    
#> 4          5.5         2.3          4           1.3 b    
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width c    
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          6.3         3.3          6           2.5 c    
#> 2          5.8         2.7          5.1         1.9 c    
#> 3          7.1         3            5.9         2.1 c    
#> 4          6.3         2.9          5.6         1.8 c    
#> # ... with 46 more rows





# dplyr > 1.0.0
# objective: `rename()` or `mutate()` in pipe on list-column of data.frames 
#            while using different column names on LHS coming from another
#            column (here `new`)

myiris_row <- myiris %>% rowwise

# rename --------
# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename({{new}} := "Sepal.Length"))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! new := "Sepal.Length")))  
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! sym(new) := "Sepal.Length")))  
#> Error: Only strings can be converted to symbols

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(all_of(new) := "Sepal.Length")))  
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% rename(`:=`(all_of(new), "Sepal.Length")))`.
#> i The error occured in row 1.

# working, but only with `rename_with()`
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename_with(~ new, "Sepal.Length")))  %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>       a Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   5.1         3.5          1.4         0.2
#> 2   4.9         3            1.4         0.2
#> 3   4.7         3.2          1.3         0.2
#> 4   4.6         3.1          1.5         0.2
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>       b Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   7           3.2          4.7         1.4
#> 2   6.4         3.2          4.5         1.5
#> 3   6.9         3.1          4.9         1.5
#> 4   5.5         2.3          4           1.3
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>       c Sepal.Width Petal.Length Petal.Width
#>   <dbl>       <dbl>        <dbl>       <dbl>
#> 1   6.3         3.3          6           2.5
#> 2   5.8         2.7          5.1         1.9
#> 3   7.1         3            5.9         2.1
#> 4   6.3         2.9          5.6         1.8
#> # ... with 46 more rows


# mutate ------
# the values of the new column don't matter
# here we just use the same input as the name, to show that RHS evaluation is easier.

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(!! new := new))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.

# not working
myiris %>% 
  mutate(mydat = list(mydat %>% mutate(!! sym(new) := new))) 
#> Error: Only strings can be converted to symbols

# not working
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(all_of(new) := new))) 
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% mutate(`:=`(all_of(new), new)))`.
#> i The error occured in row 1.

# almost working (what's going on in the data[[1]] btw!)
myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate("{{new}}" := new)))  %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `promise_fn(3L)`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>           
#> 1          5.1         3.5          1.4         0.2 a               
#> 2          4.9         3            1.4         0.2 a               
#> 3          4.7         3.2          1.3         0.2 a               
#> 4          4.6         3.1          1.5         0.2 a               
#> # ... with 46 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `"b"`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          7           3.2          4.7         1.4 b    
#> 2          6.4         3.2          4.5         1.5 b    
#> 3          6.9         3.1          4.9         1.5 b    
#> 4          5.5         2.3          4           1.3 b    
#> # ... with 46 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width `"c"`
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1          6.3         3.3          6           2.5 c    
#> 2          5.8         2.7          5.1         1.9 c    
#> 3          7.1         3            5.9         2.1 c    
#> 4          6.3         2.9          5.6         1.8 c    
#> # ... with 46 more rows

由reprex包(v0.3.0)在2020-12-22创建

共有1个答案

蒋昊天
2023-03-14

您可以使用quote()保护您的!!不受外部呼叫的影响,然后在嵌套呼叫中再次使用!!取消其引号:

myiris_row %>% 
  mutate(mydat = list(mydat %>% mutate(!! quote(!!new) := new))) %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width a    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          5.1         3.5          1.4         0.2 a    
#>  2          4.9         3            1.4         0.2 a    
#>  3          4.7         3.2          1.3         0.2 a    
#>  4          4.6         3.1          1.5         0.2 a    
#>  5          5           3.6          1.4         0.2 a    
#>  6          5.4         3.9          1.7         0.4 a    
#>  7          4.6         3.4          1.4         0.3 a    
#>  8          5           3.4          1.5         0.2 a    
#>  9          4.4         2.9          1.4         0.2 a    
#> 10          4.9         3.1          1.5         0.1 a    
#> # ... with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width b    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          7           3.2          4.7         1.4 b    
#>  2          6.4         3.2          4.5         1.5 b    
#>  3          6.9         3.1          4.9         1.5 b    
#>  4          5.5         2.3          4           1.3 b    
#>  5          6.5         2.8          4.6         1.5 b    
#>  6          5.7         2.8          4.5         1.3 b    
#>  7          6.3         3.3          4.7         1.6 b    
#>  8          4.9         2.4          3.3         1   b    
#>  9          6.6         2.9          4.6         1.3 b    
#> 10          5.2         2.7          3.9         1.4 b    
#> # ... with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width c    
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>
#>  1          6.3         3.3          6           2.5 c    
#>  2          5.8         2.7          5.1         1.9 c    
#>  3          7.1         3            5.9         2.1 c    
#>  4          6.3         2.9          5.6         1.8 c    
#>  5          6.5         3            5.8         2.2 c    
#>  6          7.6         3            6.6         2.1 c    
#>  7          4.9         2.5          4.5         1.7 c    
#>  8          7.3         2.9          6.3         1.8 c    
#>  9          6.7         2.5          5.8         1.8 c    
#> 10          7.2         3.6          6.1         2.5 c    
#> # ... with 40 more rows
myiris_row %>% 
  mutate(mydat = list(mydat %>% rename(!! quote(!!new) := "Sepal.Length"))) %>%
  pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#>        a Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   5.1         3.5          1.4         0.2
#>  2   4.9         3            1.4         0.2
#>  3   4.7         3.2          1.3         0.2
#>  4   4.6         3.1          1.5         0.2
#>  5   5           3.6          1.4         0.2
#>  6   5.4         3.9          1.7         0.4
#>  7   4.6         3.4          1.4         0.3
#>  8   5           3.4          1.5         0.2
#>  9   4.4         2.9          1.4         0.2
#> 10   4.9         3.1          1.5         0.1
#> # ... with 40 more rows
#> 
#> [[2]]
#> # A tibble: 50 x 4
#>        b Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   7           3.2          4.7         1.4
#>  2   6.4         3.2          4.5         1.5
#>  3   6.9         3.1          4.9         1.5
#>  4   5.5         2.3          4           1.3
#>  5   6.5         2.8          4.6         1.5
#>  6   5.7         2.8          4.5         1.3
#>  7   6.3         3.3          4.7         1.6
#>  8   4.9         2.4          3.3         1  
#>  9   6.6         2.9          4.6         1.3
#> 10   5.2         2.7          3.9         1.4
#> # ... with 40 more rows
#> 
#> [[3]]
#> # A tibble: 50 x 4
#>        c Sepal.Width Petal.Length Petal.Width
#>    <dbl>       <dbl>        <dbl>       <dbl>
#>  1   6.3         3.3          6           2.5
#>  2   5.8         2.7          5.1         1.9
#>  3   7.1         3            5.9         2.1
#>  4   6.3         2.9          5.6         1.8
#>  5   6.5         3            5.8         2.2
#>  6   7.6         3            6.6         2.1
#>  7   4.9         2.5          4.5         1.7
#>  8   7.3         2.9          6.3         1.8
#>  9   6.7         2.5          5.8         1.8
#> 10   7.2         3.6          6.1         2.5
#> # ... with 40 more rows
 类似资料:
  • 当相应的列>0时,我需要将一些数据列设置为NA。 我也在考虑重塑,这样我就可以做一个变异。这里最好的做法是什么?

  • 给定数据帧,如下所示 我想通过dplyr 基于< code>var的值添加一个col 。 基于以下逻辑。 如果或则,如果或则 一起使用,如下所示

  • 我用R编写了以下代码,效果很好。但是,假设我必须对具有多个级别的因子变量应用类似的代码(

  • 我想从数据帧中提取一个变量名,并用dplyr::mutate创建一个新变量。我必须写什么才能接受通过“md$meta[1]”提供的变量名?我想这是直截了当的,但我还没能在网上找到答案。如有任何帮助,不胜感激!

  • 而不是抛出错误。有没有一种方法可以在dplyr中得到相同的结果而不会得到未知变量错误?

  • 我有数据。看起来像这样的框架 首先,我想根据Day aka group_by(Day)对数据帧进行分组。当在每个组中,每种类型(tr1,tr2)的和(平均sd)大于控制(ctrl)的差(平均sd),然后我想在新列(new.col)中指定值~是,如果不是,我想指定值~否。 例如,我希望我的数据看起来像这样。它不一定要看起来像这样