问题：

Python Pandas-读取包含多个表的csv文件

夏兴平

2023-03-14

我有一个。包含多个表的csv文件。

使用Pandas，从这个文件中获得两个DataFrame库存和HPBladeSystemRack的最佳策略是什么？

输入。csv如下所示：

Inventory       
System Name            IP Address    System Status
dg-enc05                             Normal
dg-enc05_vc_domain                   Unknown
dg-enc05-oa1           172.20.0.213  Normal

HP BladeSystem Rack         
System Name               Rack Name   Enclosure Name
dg-enc05                  BU40  
dg-enc05-oa1              BU40        dg-enc05
dg-enc05-oa2              BU40        dg-enc05

到目前为止，我想到的最好的方法是转换这个。csv文件转换为Excel工作簿（xlxs），将表格拆分为工作表并使用：

inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1)
HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)

然而：

这种方法需要xlrd模块。
这些日志文件必须被实时分析，这样就可以更好地找到一种方法来分析它们，因为它们来自日志。
真正的日志比那两个有更多的表。

共有3个答案

昌乐

2023-03-14

熊猫似乎还没有准备好轻松地做到这一点，所以我最终做了自己的split_csv函数。它只需要表名，并将输出以每个表命名的. csv文件。

import csv
from os.path import dirname # gets parent folder in a path
from os.path import join # concatenate paths

table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]

def split_csv(csv_path, table_names):
    tables_infos = detect_tables_from_csv(csv_path, table_names)
    for table_info in tables_infos:
        split_csv_by_indexes(csv_path, table_info)

def split_csv_by_indexes(csv_path, table_info):
    title, start_index, end_index = table_info
    print title, start_index, end_index
    dir_ = dirname(csv_path)
    output_path = join(dir_, title) + ".csv"
    with open(output_path, 'w') as output_file, open(csv_path, 'rb') as input_file:
        writer = csv.writer(output_file)
        reader = csv.reader(input_file)
        for i, line in enumerate(reader):
            if i < start_index:
                continue
            if i > end_index:
                break
            writer.writerow(line)

def detect_tables_from_csv(csv_path, table_names):
    output = []
    with open(csv_path, 'rb') as csv_file:
        reader = csv.reader(csv_file)
        for idx, row in enumerate(reader):
            for col in row:
                match = [title for title in table_names if title in col]
                if match:
                    match = match[0] # get the first matching element
                    try:
                        end_index = idx - 1
                        start_index
                    except NameError:
                        start_index = 0
                    else:
                        output.append((previous_match, start_index, end_index))
                    print "Found new table", col
                    start_index = idx
                    previous_match = match
                    match = False

        end_index = idx  # last 'end_index' set to EOF
        output.append((previous_match, start_index, end_index))
        return output


if __name__ == '__main__':
    csv_path = 'switch_records.csv'
    try:
        split_csv(csv_path, table_names)
    except IOError as e:
        print "This file doesn't exist. Aborting."
        print e
        exit(1)

祁嘉瑞

2023-03-14

我假设您知道要从csv文件中解析的表的名称。如果是这样，您可以检索每个的索引位置，并相应地选择相关切片。作为草图，这可能看起来像：

df = pd.read_csv('path_to_file')    
index_positions = []
for table in table_names:
    index_positions.append(df[df['col_with_table_names']==table].index.tolist()[0])

## Include end of table for last slice, omit for iteration below
index_positions.append(df.index.tolist()[-1])

tables = {}
for position in index_positions[:-1]:
    table_no = index_position.index(position)
    tables[table_names[table_no] = df.loc[position:index_positions[table_no+10]]

当然还有更优雅的解决方案，但这将为您提供一个字典，其中表名为键，相应的表名为值。

呼延明朗

2023-03-14

如果您事先知道表名，则如下所示：

df = pd.read_csv("jahmyst2.csv", header=None, names=range(3))
table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}

应努力生成一个字典，其中键作为表名，值作为子表。

>>> list(tables)
['HP BladeSystem Rack', 'Inventory']
>>> for k,v in tables.items():
...     print("table:", k)
...     print(v)
...     print()
...     
table: HP BladeSystem Rack
              0          1               2
6   System Name  Rack Name  Enclosure Name
7      dg-enc05       BU40             NaN
8  dg-enc05-oa1       BU40        dg-enc05
9  dg-enc05-oa2       BU40        dg-enc05

table: Inventory
                    0             1              2
1         System Name    IP Address  System Status
2            dg-enc05           NaN         Normal
3  dg-enc05_vc_domain           NaN        Unknown
4        dg-enc05-oa1  172.20.0.213         Normal

一旦得到了这些，就可以将列名设置为第一行，等等。

类似资料：

读取包含Python中转义字符的csv文件

大家好，提前致谢！我正在处理一个处理utf-8字符串并替换特定字符的Python脚本。因此，我使用，同时通过一个定义unicode字符及其所需替换的列表进行循环，如下所示。到目前为止，一切都很好。但现在考虑一个包含要替换的字符的csv文件，如下所示。由于转义字符的原因，我很不幸地未能将csv数据读入列表。我使用< code>csv模块读取数据，如下所示: 这将导致像< code>('\\U0
使用pandas读取zip文件中包含的多个文件

问题内容：我有多个包含不同类型的txt文件的zip文件。如下所示：如何使用pandas读取每个文件而不提取它们？我知道每个zip文件是否为1个文件，我可以对read_csv使用压缩方法，如下所示：任何有关如何执行此操作的帮助都将非常有用。问题答案：你可以传递到构建从包装成一个多文件一个CSV文件。码：将所有内容读入字典的示例：
使用numpy.genfromtxt读取包含逗号的字符串的csv文件

问题内容：我正在尝试使用csv文件读取文件，但某些字段是包含逗号的字符串。字符串用引号引起来，但是numpy不能将引号识别为定义了单个字符串。例如，使用“ t.csv”中的数据：编码产生错误： ValueError：检测到一些错误！第2行（获得4列而不是3列）我正在寻找的数据结构是：查看文档，我看不到任何解决方案。有没有办法用numpy做到这一点，或者我只需要使用模块读入数据，然后将其转
读取压缩在一个文件中的多个csv文件

我在on文件夹的几个zip文件中有几个csv文件，例如： null zip中的一个csv是这样工作的：你知道如何优化这个循环吗？
Spark读取多个CSV文件，每个文件一个分区

/tmp/data/myfile1.csv,/tmp/data/myfile2.csv,/tmp/data.myfile3.csv,/tmp/datamyfile4.csv 我希望将这些文件读入Spark DataFrame或RDD，并且希望每个文件都是DataFrame的一个解析。我怎么能这么做？
JMeter - 将一行 CSV 文件读取到另一个 CSV 文件的多行

如何在JMeter中将一个csv文件循环到另一个csv文件，其中第一个csv文件包含所有登录数据，另一个csv文件包含交易数据。我应该运行1个出纳员应该处理30笔交易的地方。

Python Pandas-读取包含多个表的csv文件

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档