R子集数据框中的行和列

范华清

2023-03-14

本文向大家介绍R子集数据框中的行和列，包括了R子集数据框中的行和列的使用技巧和注意事项，需要的朋友参考一下

示例

语法访问的行和列：[，[[，和$

本主题涵盖访问数据帧的特定行和列的最常用语法。这些是

就像matrix带单括号的data[rows, columns]

使用行号和列号
使用列（和行）名称

像list：

用单括号data[columns]获取数据框
用双括号data[[one_column]]获取向量

对于$单列data$column_name

我们将使用内置mtcars数据框进行说明。

像矩阵： data[rows, columns]

带数字索引

使用内置的数据框mtcars，我们可以使用[]带有逗号的方括号提取行和列。逗号前的索引是行：

# get the first row
mtcars[1, ]
# get the first five rows
mtcars[1:5, ]

同样，在逗号后面是列：

# get the first column
mtcars[, 1]
# get the first, third and fifth columns:
mtcars[, c(1, 3, 5)]

如上所示，如果将行或列留为空白，则将全部选中。mtcars[1, ]表示所有列的第一行。

带列（和行）名称

到目前为止，这与访问矩阵的行和列的方式相同。对于data.frames，大多数情况下，最好将列名用于列索引。通过使用character带有列名而不是numeric列号的a来完成：

# get the mpg column
mtcars[, "mpg"]
# get the mpg, cyl, and disp columns
mtcars[, c("mpg", "cyl", "disp")]

虽然不太常见，但也可以使用行名：

mtcars["Mazda Rx4", ]

行和列在一起

行和列参数可以一起使用：

# first four rows of the mpg column
mtcars[1:4, "mpg"]

# 2nd and 5th row of the mpg, cyl, and disp columns
mtcars[c(2, 5), c("mpg", "cyl", "disp")]

关于尺寸的警告：

使用这些方法时，如果提取多个列，则会返回一个数据框。但是，如果提取单个列，则在默认选项下将获得向量，而不是数据帧。

## multiple columns returns a data frame
class(mtcars[, c("mpg", "cyl")])
# [1] "data.frame"
## single column returns a vector
class(mtcars[, "mpg"])
# [1] "numeric"

有两种解决方法。一种是将数据框视为列表（请参见下文），另一种是添加drop = FALSE参数。这告诉R不要“丢弃未使用的尺寸”：

class(mtcars[, "mpg", drop = FALSE])
# [1] "data.frame"

请注意，矩阵的工作方式相同-默认情况下，单个列或行将是向量，但是如果您指定drop = FALSE，则可以将其保留为单列或单行矩阵。

像清单

数据帧本质上是lists，即，它们是列向量的列表（所有列必须具有相同的长度）。列表可以是子集，[对于子列表可以使用单括号括起来，[[对于单个元素可以使用双括号括起来。

带单括号 data[columns]

当您使用单括号而不使用逗号时，由于数据框是列列表，因此您将返回列。

mtcars["mpg"]
mtcars[c("mpg", "cyl", "disp")]
my_columns <- c("mpg", "cyl", "hp")
mtcars[my_columns]

单方括号（ 如列表）与单方括号（ 如矩阵）

data[columns]和之间的区别data[, columns]是，当将data.framea当作list（括号内没有逗号）时，返回的对象将是adata.frame。如果使用逗号将a视为data.frame类似，matrix则选择单个列将返回向量，但是选择多个列将返回data.frame。

## When selecting a single column
## like a list will return a data frame
class(mtcars["mpg"])
# [1] "data.frame"
## like a matrix will return a vector
class(mtcars[, "mpg"])
# [1] "numeric"

带双括号 data[[one_column]]

要提取单个列作为载体治疗你的时候data.frame作为一个list，你可以使用双括号[[。一次仅适用于一列。

# extract a single column by name as a vector 
mtcars[["mpg"]]

# extract a single column by name as a data frame (as above)
mtcars["mpg"]

使用$访问列

可以使用魔术快捷方式提取单个列，$而无需使用带引号的列名：

# get the column "mpg"
mtcars$mpg

所访问的列$将始终是向量，而不是数据帧。

$访问列的缺点

该$可以方便快捷，特别是如果你在一个环境中工作（如RStudio），将自动完成在这种情况下，列名。但是，它 $也有缺点：它使用非标准评估来避免使用引号，这意味着如果您的列名存储在变量中，它将不起作用。

my_column <- "mpg"
# the below will not work
mtcars$my_column
# but these will work
mtcars[, my_column]  # vector
mtcars[my_column]    # one-column data frame
mtcars[[my_column]]  # vector

由于这些问题，当您的列名恒定时，$最好在交互式R会话中使用。对于编程用途，例如，$应避免在编写通用化功能时将其用于具有不同列名的不同数据集。

另请注意，默认行为是仅在以下情况下从递归对象（环境除外）中提取时使用部分匹配： $

# give you the values of "mpg" column 
# as "mtcars" 只有一列的名称以 "m"
mtcars$m 
# will give you "NULL" 
# as "mtcars" 有不止一列的名称以 "d"
mtcars$d

高级索引：负索引和逻辑索引

只要我们可以选择使用数字作为索引，我们还可以使用负数来省略某些索引，或者使用布尔（逻辑）向量来确切指示要保留的项目。

负索引省略元素

mtcars[1, ]   # first row
mtcars[ -1, ] # everything but the first row
mtcars[-(1:10), ] # everything except the first 10 rows

逻辑向量表示要保留的特定元素

我们可以使用诸如<生成逻辑向量之类的条件，并仅提取满足条件的行：

# logical vector indicating TRUE when a row has mpg less than 15
# FALSE when a row has mpg >= 15
test <- mtcars$mpg < 15 

# extract these rows from the data frame 
mtcars[test, ]

我们还可以绕过保存中间变量的步骤

# extract all columns for rows where the value of cyl is 4.
mtcars[mtcars$cyl == 4, ]
# extract the cyl, mpg, and hp columns where the value of cyl is 4
mtcars[mtcars$cyl == 4, c("cyl", "mpg", "hp")]