当前位置: 首页 > 知识库问答 >
问题:

Python——如何在CSV文件中使用循环找到同一列的两行之间的差异?

左丘恩
2023-03-14
 

 Date   Revenue
9-Jan   $943,690.00 
9-Feb   $1,062,565.00 
9-Mar   $210,079.00 
9-Apr   -$735,286.00
9-May   $842,933.00 
9-Jun   $358,691.00 
9-Jul   $914,953.00 
9-Aug   $723,427.00 
9-Sep   -$837,468.00
9-Oct   -$146,929.00
9-Nov   $831,730.00 
9-Dec   $917,752.00 
10-Jan  $800,038.00 
10-Feb  $1,117,103.00 
10-Mar  $181,220.00 
10-Apr  $120,968.00 
10-May  $844,012.00 
10-Jun  $307,468.00 
10-Jul  $502,341.00 

# This is what I did so far...

# Dependencies
import csv

# Files to load (Remember to change these)
file_to_load = "raw_data/budget_data_2.csv"


totalrev = 0
count = 0

# Read the csv and convert it into a list of dictionaries
with open(file_to_load) as revenue_data:
    reader = csv.reader(revenue_data)

    next(reader)  



    for row in reader:

        count += 1
        revenue = float(row[1])     
        totalrev += revenue

    for i in range(1,revenue):
         revenue_change = (revenue[i+1] - revenue[i])

avg_rev_change = sum(revenue_change)/count

print("avg rev change: ", avg_rev_change)         

print ("budget_data_1.csv")
print ("---------------------------------")
print ("Total Months: ", count)
print ("Total Revenue:", totalrev)





我在CSV文件中有上述数据。我在查找收入变化时遇到问题,即第1行-第0行,第2行-第1行等的收入。。。最后,我想要总收入变化的总和。我试过使用loop,但我猜这是个愚蠢的错误。请给我建议代码,这样我可以比较我的错误。我不熟悉python和编码。

共有2个答案

堵茂勋
2023-03-14
import csv

# Files to load (Remember to change these)
file_to_load = "raw_data/budget_data_2.csv"


# Read the csv and convert it into a list of dictionaries
with open(file_to_load) as revenue_data:
    reader = csv.reader(revenue_data)

    # use of next to skip first title row in csv file
    next(reader) 
    revenue = []
    date = []
    rev_change = []

    # in this loop I did sum of column 1 which is revenue in csv file and counted total months which is column 0 
    for row in reader:

        revenue.append(float(row[1]))
        date.append(row[0])

    print("Financial Analysis")
    print("-----------------------------------")
    print("Total Months:", len(date))
    print("Total Revenue: $", sum(revenue))


    #in this loop I did total of difference between all row of column "Revenue" and found total revnue change. Also found out max revenue change and min revenue change. 
    for i in range(1,len(revenue)):
        rev_change.append(revenue[i] - revenue[i-1])   
        avg_rev_change = sum(rev_change)/len(rev_change)

        max_rev_change = max(rev_change)

        min_rev_change = min(rev_change)

        max_rev_change_date = str(date[rev_change.index(max(rev_change))])
        min_rev_change_date = str(date[rev_change.index(min(rev_change))])


    print("Avereage Revenue Change: $", round(avg_rev_change))
    print("Greatest Increase in Revenue:", max_rev_change_date,"($", max_rev_change,")")
    print("Greatest Decrease in Revenue:", min_rev_change_date,"($", min_rev_change,")")


我得到的输出


Financial Analysis
-----------------------------------
Total Months: 86
Total Revenue: $ 36973911.0
Avereage Revenue Change: $ -5955
Greatest Increase in Revenue: Jun-2014 ($ 1645140.0 )
Greatest Decrease in Revenue: May-2014 ($ -1947745.0 )

吕淮晨
2023-03-14

目前还不清楚您是否可以使用第三方包,例如熊猫,但是熊猫非常擅长这些类型的操作。我建议您使用它的功能,而不是逐行迭代。

df是一只熊猫。数据帧对象。使用熊猫。阅读_csv将数据加载到数据框中。

>>> df
      Date        Revenue
0    9-Jan    $943,690.00
1    9-Feb  $1,062,565.00
2    9-Mar    $210,079.00
3    9-Apr   -$735,286.00
4    9-May    $842,933.00
5    9-Jun    $358,691.00
6    9-Jul    $914,953.00
7    9-Aug    $723,427.00
8    9-Sep   -$837,468.00
9    9-Oct   -$146,929.00
10   9-Nov    $831,730.00
11   9-Dec    $917,752.00
12  10-Jan    $800,038.00
13  10-Feb  $1,117,103.00
14  10-Mar    $181,220.00
15  10-Apr    $120,968.00
16  10-May    $844,012.00
17  10-Jun    $307,468.00
18  10-Jul    $502,341.00

# Remove the dollar sign and any other weird chars
>>> df['Revenue'] = [float(''.join(c for c in row if c in '.1234567890')) for row in df['Revenue']]

使用熊猫。系列shift将上个月的值与当月的值对齐,然后减去两个值:

>>> df['Diff'] = df['Revenue'] - df['Revenue'].shift(1)
>>> df
      Date    Revenue      Diff
0    9-Jan   943690.0       NaN
1    9-Feb  1062565.0  118875.0
2    9-Mar   210079.0 -852486.0
3    9-Apr   735286.0  525207.0
4    9-May   842933.0  107647.0
5    9-Jun   358691.0 -484242.0
6    9-Jul   914953.0  556262.0
7    9-Aug   723427.0 -191526.0
8    9-Sep   837468.0  114041.0
9    9-Oct   146929.0 -690539.0
10   9-Nov   831730.0  684801.0
11   9-Dec   917752.0   86022.0
12  10-Jan   800038.0 -117714.0
13  10-Feb  1117103.0  317065.0
14  10-Mar   181220.0 -935883.0
15  10-Apr   120968.0  -60252.0
16  10-May   844012.0  723044.0
17  10-Jun   307468.0 -536544.0
18  10-Jul   502341.0  194873.0
 类似资料:
  • 你好,我有一个java类,它统计csv文件中的所有行。但是现在我需要创建一个过滤器,这样它只计算一定数量的行,这取决于过滤器中的条件。 像这样的东西: 我现在正在学习java,这是相当困难的。 我现在的代码是: csv文件(它有10.000行,如下所示):`

  • 问题内容: 如何分辨两个datetime物体之间的时间差(以分钟为单位)? 问题答案: 从第一次减去以后的时间将创建一个仅保留时差的对象。在上面的示例中,它是0分钟,8秒和562000微秒。

  • 问题内容: 我有两个日期,格式如“ Ymd H:i:s”。我需要比较这两个日期并找出时差。 问题答案: 您可以将它们转换为时间戳,然后从那里开始: 除以3600是因为一小时内有3600秒,请避免使用很多小数位。

  • 问题内容: 我有一个存储一些时间戳的mysql数据库。假设表中所有的都是ID和时间戳。时间戳可能重复。 我想查找不重复的连续行之间的平均时间差(按时间)。有没有办法在SQL中做到这一点? 问题答案: 如果您的表是t,而您的时间戳列是ts,并且您想在几秒钟内得到答案: 对于大型桌子,这将是更快的里程,因为它没有n平方的JOIN 这使用了一个可爱的数学技巧,可以解决该问题。暂时忽略重复的问题。连续行之

  • 我是一个新手在python和我试图弄清楚,我如何区分2列表,如下所示 而且 问题是,这两个列表都有元素,我需要一个可靠的条件来满足,如果列表中有一个项是字符串而不是。列表也可能有7个,只有一个项目是字符串。

  • 我有两个DateTime对象,它们需要找出它们之间差异的持续时间, 我有以下代码,但不确定如何继续以达到以下预期结果: 示例: 代码: