问题：

将特定函数应用于数据帧的所有行时出错

丌官运珧

2023-03-14

如果之前已经解决了这个问题，请提前道歉，但我已经尝试过浏览所有与ddply、sApplication和Application相关的问题，但我一生都无法解决这个问题...

我已经编写了一个函数CountMonths，它将计费周期中的日、月和总天数作为参数，并返回计费周期中的日历月数：

countMonths <- function(day, month, cycle.days) {
  month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
  if (month < 1 | month > 12 | floor(month) != month) {
    cat("Invalid month value, must be an integer from 1 to 12")
  } else if (day < 1 | day > month.days[month]) {
    cat("Invalid day value, must be between 1 and month.days[month]")
  } else if (cycle.days < 0) {
    cat("Invalid cycle.days value, must be >= 0")
  } else {
    nmonths <- 1
    day.ct <- cycle.days - day
    while (day.ct > 0) {
      nmonths <- nmonths + 1
      month <- ifelse(month == 1, 12, month - 1) # sets to previous month    
      day.ct  <- day.ct - month.days[month] # subtracts days of previous month
    }
    nmonths
  }
}

我想将此函数应用于包含客户账单记录的data.frame中的每一行，例如。

> head(cons2[-1],10)
   kwh cycle.days  read.date row.index year month day kwh.per.day
1  381         29 2010-09-02         1 2010     9   2   13.137931
2  280         32 2010-10-04         2 2010    10   4    8.750000
3  282         29 2010-11-02         3 2010    11   2    9.724138
4  330         34 2010-12-06         4 2010    12   6    9.705882
5  371         30 2011-01-05         5 2011     1   5   12.366667
6  405         30 2011-02-04         6 2011     2   4   13.500000
7  441         32 2011-03-08         7 2011     3   8   13.781250
8  290         29 2011-04-06         8 2011     4   6   10.000000
9  296         29 2011-05-05         9 2011     5   5   10.206897
10 378         32 2011-06-06        10 2011     6   6   11.812500

> dput(head(cons2[-1],10))
structure(list(kwh = c(381L, 280L, 282L, 330L, 371L, 405L, 441L, 
290L, 296L, 378L), cycle.days = c(29L, 32L, 29L, 34L, 30L, 30L, 
32L, 29L, 29L, 32L), read.date = structure(c(1283385600, 1286150400, 
1288656000, 1291593600, 1294185600, 1296777600, 1299542400, 1302048000, 
1304553600, 1307318400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    row.index = 1:10, year = c(2010, 2010, 2010, 2010, 2011, 
    2011, 2011, 2011, 2011, 2011), month = c(9, 10, 11, 12, 1, 
    2, 3, 4, 5, 6), day = c(2L, 4L, 2L, 6L, 5L, 4L, 8L, 6L, 5L, 
    6L), kwh.per.day = c(13.1379310344828, 8.75, 9.72413793103448, 
    9.70588235294118, 12.3666666666667, 13.5, 13.78125, 10, 10.2068965517241, 
    11.8125)), .Names = c("kwh", "cycle.days", "read.date", "row.index", 
"year", "month", "day", "kwh.per.day"), row.names = c(NA, 10L
), class = "data.frame")

我尝试了几种选择，但没有一种效果好。具体来说，我需要能够将给定变量的值作为标量（或长度为1的向量）传递给数据帧中的每一行，但它们总是作为向量传递：

> cons2$tot.months <- countMonths(cons2$day, cons2$month, cons2$cycle.days)  
Warning messages:
1: In if (month < 1 | month > 12 | floor(month) != month) { :
  the condition has length > 1 and only the first element will be used
2: In if (day < 1 | day > month.days[month]) { :
  the condition has length > 1 and only the first element will be used
3: In if (cycle.days < 0) { :
  the condition has length > 1 and only the first element will be used
4: In while (day.ct > 0) { :
  the condition has length > 1 and only the first element will be used
5: In while (day.ct > 0) { :
  the condition has length > 1 and only the first element will be used

我终于能够使用ddply得到正确的结果，将每行视为自己的组，但这需要很长时间：

cons2 <- ddply(cons2, .(account, year, month, day), transform,
               tot.months = countMonths(day, month, cycle.days)
)

有没有更好的方法将此功能应用于我的数据帧的每一行？或者，作为一个相关的问题，如何将数据帧中的变量作为标量参数（给定行中的值）传递，而不是数据帧中该变量的所有值的向量？如果有人能指出我在概念上的错误，我将特别感激。

共有1个答案

贺栋

2023-03-14

要使函数工作，可以使用mapply，它将依次将函数应用于传递给它的所有向量的每个元素。所以你可以做：

mapply(countMonths,cons2$day,cons2$month,cons2$cycle.days)

正如我在评论中提到的，有更简单的方法可以做到这一点。例如，我认为这会起作用：

cons2$read.date=as.Date(cons2$read.date)
monnb <- function(d){ lt <- as.POSIXlt(as.Date(d, origin="1900-01-01"));  lt$year*12 + lt$mon }
mondf <- function(d1, d2)  monnb(d2) - monnb(d1) 
mondf(cons2$read.date-cons2$cycle.days,cons2$read.date) + 1

另外，我注意到你试图抓住你的函数不起作用的所有条件，这很好！有一个非常方便的函数叫做oreifnot，它将为这个目的服务：

countMonths <- function(day, month, cycle.days) {
  month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
  stopifnot(month >=1 & month <= 12 & floor(month)==month & cycle.days >=0 & day >= 1 & day <= month.days[month]) 
  nmonths <- 1
  day.ct <- cycle.days - day
  while (day.ct > 0) {
    nmonths <- nmonths + 1
    month <- ifelse(month == 1, 12, month - 1) # sets to previous month    
    day.ct  <- day.ct - month.days[month] # subtracts days of previous month
  }
  nmonths
}

至于对函数的注释，我认为它是可行的，但它没有利用R中的向量运算。我从另一个答案中得到的函数非常灵活，因为它允许您一次向它提供一个完整的日期向量，而不是依次循环每个日期向量。

类似资料：

将函数按行应用于数据帧

我必须从二维坐标计算希尔伯特曲线上的距离。使用hilbertcurve包，我构建了自己的“hilbert”函数。坐标存储在数据帧（列1和列2）中。如您所见，我的函数在应用于两个值（test）时有效。然而，它只是不工作时，应用行明智通过应用函数！这是为什么呢？我到底做错了什么？我需要一个额外的列“希尔伯特”，希尔伯特距离在列“col_1”和“col_2”中给出。最后一个命令以错误结束：谢谢你的
将函数应用于数据帧的列列表？

我从这个URL刮取了这个表： "https://www.patriotsoftware.com/blog/accounting/average-cost-living-by-state/" 看起来像这样：然后我编写了这个函数来帮助我将字符串转换成整数：当我只将函数应用于一列时，它就会工作。我在这里找到了关于在多个列上使用的答案：如何将函数应用于多个列但我下面的代码不起作用，也不会产生错误：
将函数应用于火花数据帧列

并将其应用于数据表的一列--这是我希望这样做的：我还没有找到任何简单的方法，正在努力找出如何做到这一点。一定有一个更简单的方法，比将数据rame转换为和RDD，然后从RDD中选择行来获得正确的字段，并将函数映射到所有的值，是吗？创建一个SQL表，然后用一个sparkSQL UDF来完成这个任务，这更简洁吗？
如何将函数应用于Pandas数据帧的两列

怎么办？ **添加详细示例如下***
太长的数据帧应用行函数

读取列中包含时间值的csv文件，并尽可能高效地获取包含1列值和日期时间索引的数据帧。我做了一个read_csv，然后是一个stack和下面的函数，但是这会消耗更多的时间和内存。有人有更好的方法吗？并获得：
函数将不应用于数据帧，获取语法错误

我试图将此函数应用于pandas数据帧，以查看出租车上下车时间是否在我使用下面的arrivemin、ArriveMax变量创建的范围内。如果时间真的在这个范围内，我想保留这一行。如果超出范围，我想从数据帧中删除它。开始。时间，结束。时间等都是日期时间对象，所以时间功能应该工作正常。继续获取此语法错误：

将特定函数应用于数据帧的所有行时出错

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档