使用python读取csv中的特定列

徐嘉勋

2023-03-14

问题内容：

我有一个csv文件，看起来像这样：

+ ----- + ----- + ----- + ----- + ----- + ----- + ----- + ----- +
| AAA | bbb | ccc | DDD | eee | FFF | GGG | hhh |
+ ----- + ----- + ----- + ----- + ----- + ----- + ----- + ----- +
| 1 | 2 | 3 | 4 | 50 | 3 | 20 | 4 |
| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |
| 4 | 1 | 3 | 6 | 34 | 1 | 22 | 5 |
| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |
| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |
+ ----- + ----- + ----- + ----- + ----- + ----- + ----- + ----- +

…

如何只读取python中的“
AAA，DDD，FFF，GGG”列并跳过标题？我想要的输出是一个看起来像这样的元组列表：[（1,4,3,20），（2,5,2,23），（4,6,1,22）]。我正在考虑稍后将这些数据写入SQLdatabase。

我提到了这篇文章：使用csv模块从csv文件中读取特定的列？。但是我认为这对我的情况没有帮助。由于我的.csv很大，有一整列的列，我希望我可以告诉python我想要的列名，以便python可以为我逐行读取特定的列。

问题答案：

def read_csv(file, columns, type_name="Row"):
  try:
    row_type = namedtuple(type_name, columns)
  except ValueError:
    row_type = tuple
  rows = iter(csv.reader(file))
  header = rows.next()
  mapping = [header.index(x) for x in columns]
  for row in rows:
    row = row_type(*[row[i] for i in mapping])
    yield row

例子：

>>> import csv
>>> from collections import namedtuple
>>> from StringIO import StringIO
>>> def read_csv(file, columns, type_name="Row"):
...   try:
...     row_type = namedtuple(type_name, columns)
...   except ValueError:
...     row_type = tuple
...   rows = iter(csv.reader(file))
...   header = rows.next()
...   mapping = [header.index(x) for x in columns]
...   for row in rows:
...     row = row_type(*[row[i] for i in mapping])
...     yield row
... 
>>> testdata = """\
... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
... 1,2,3,4,50,3,20,4
... 2,1,3,5,24,2,23,5
... 4,1,3,6,34,1,22,5
... 2,1,3,5,24,2,23,5
... 2,1,3,5,24,2,23,5
... """
>>> testfile = StringIO(testdata)
>>> for row in read_csv(testfile, "AAA GGG DDD".split()):
...   print row
... 
Row(AAA='1', GGG='20', DDD='4')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='4', GGG='22', DDD='6')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='2', GGG='23', DDD='5')

使用python读取csv中的特定列

相关阅读

相关文章

相关问答

相关工具

相关文档