我有一堆需要加载到MySQL数据库中的CSV数据。好吧,也许是CSV格式的。( 编辑 :实际上,它看起来像RFC4180中描述的内容)
每行是一个用逗号分隔的双引号字符串的列表。要转义出现在列值中的任何双引号,请使用双引号。允许使用反斜杠表示自己。
例如,该行:
"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""
如果解析为JSON,则应为:
[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]
我正在尝试使用LOAD DATA
读取CSV,但是我遇到了一些奇怪的行为。
例如,考虑我是否有一个简单的两列表
shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID | int(11) | YES | | NULL | |
| UID | char(255) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
shell%
如果我的输入文件的第一行非标题行以结尾""
:
shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"
我可以加载除第一行之外的所有非标题行:
mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)
或者,我可以加载包括标题在内的所有行:
mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11 Deleted: 0 Skipped: 0 Warnings: 1
mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 9 | |
| 0 | Steve the Pirate |
| 10 | |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
11 rows in set (0.00 sec)
如果我的输入文件中没有行结尾""
:
shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"
那么我就可以不加载任何行:
mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT * FROM person;
Empty set (0.00 sec)
或者,我可以加载包括标题在内的所有行:
mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD DATA
LOCAL INFILE 'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES
TERMINATED BY '\n'
IGNORE 0 LINES
;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 1
mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM person;
+------+------------------------+
| ID | UID |
+------+------------------------+
| 0 | UID |
| 0 | Steve the Pirate |
| 1 | \Alpha |
| 2 | Hoban "Wash" Washburne |
| 3 | Pastor Veal |
| 4 | Tucker |
| 5 | Simon |
| 6 | Sonny |
| 7 | Wat\ |
+------+------------------------+
9 rows in set (0.00 sec)
因此,既然我发现了许多解决方法,我该如何使用LOAD DATA
这些文件中的数据将其导入数据库?
根据的文档LOADDATA
,默认情况下,将双引号双引号视为双引号:
如果字段以ENCLOSED BY字符开头,则只有当该字段或行的TERMINATED
BY序列紧随其后,该字符的实例才被视为终止字段值。为避免歧义,可以将字段值中ENCLOSED
BY字符的出现次数加倍,并解释为该字符的单个实例。例如,如果指定了ENCLOSED BY’“’,则引号的处理如下所示:"The ""BIG"" boss" -> The "BIG" boss The "BIG" boss -> The "BIG" boss The ""BIG"" boss -> The ""BIG"" boss
因此,我需要做的就是\
通过使用禁用解释为转义字符ESCAPED BY ''
。
LOAD DATA
LOCAL INFILE 'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY ''
LINES
TERMINATED BY '\n'
IGNORE 1 LINES
;
问题内容: 我正在编写一个Java应用程序以将数据从Oracle导出到CSV文件 不幸的是,数据的内容可能非常棘手。分隔符仍然是逗号,但连续的一些数据可能是这样的: 因此,这是该列上的字符串之一: 我说:“我是5‘10”。 别开玩笑了,我需要在Java生成的CSV文件中的excel或开放式办公室中显示上述注释,而又不能妥协,并且当然不能弄乱其他常规的转义符情况(即,常规的双引号和元组中的常规逗号)
问题内容: 我正在使用Python工具检索Twitter数据,并将其以JSON格式转储到磁盘中。我注意到一条推文用双引号引起了整个数据字符串的意外转义。此外,实际JSON格式的所有双引号都以反斜杠转义。 他们看起来像这样: “ {\” created_at \“:\”周五8月8日11:04:40 +0000 2014 \“,\” id \“:497699913925292032, 如何避免这种情况
问题内容: 我正在使用Python工具检索Twitter数据,并将其以JSON格式转储到磁盘中。我注意到,用双引号引起的一条推文意外地转义了整个数据字符串。此外,实际JSON格式的所有双引号都以反斜杠转义。 他们看起来像这样: “ {\” created_at \“:\”星期五8月8日11:04:40 +0000 2014 \“,\” id \“:497699913925292032, 如何避免这
问题内容: 我想对带有双引号的关键字进行grep操作。举一个简单的例子: 那不匹配。我该如何解决? 问题答案: 问题是您没有正确转义输入字符串,请尝试: 或者,您可以在单引号内使用不转义的双引号: 尽管第二种方法阻止您将命令嵌套在另一组单引号(例如)中,但是您会更清楚地了解到这是一个优先事项。
我有如下数据 我正在尝试使用open csv解析csv,在我的csv中,名字可以包含双引号(MAL“COLMHS”)或带逗号的双引号(“SAGAR,TARLE”)或不带双引号的名字。 因此,使用. with IgnoreQuotations(true)我可以解析第一行(MAL"COLMHS),但无法找到解析第二行的解决方案。 我尝试了多个StackOverflow链接的解决方案,但无法解决它们。
问题内容: 我有一个csv文件,其内容如下: 我想将csv数据加载到my_table中。 当我在pgAdmin III工具中运行此命令时: 我收到错误消息: 然后我修改了这样的命令: 尝试全部失败。 那么,任何人都可以在这种情况下向我显示正确的COPY命令? 问题答案: 双引号(如果已启用,请参阅docs) 或使用非标准的PostgreSQL专用转义字符串: