去除文件头部的u+feff
Today, we encountered an error while trying to create some database seeds from a CSV. This CSV was originally generated by me using a Ruby script which piped the output to a file and saved as a CSV.
今天,我们在尝试从CSV创建一些数据库种子时遇到错误。 该CSV最初是由我使用Ruby脚本生成的,该脚本将输出通过管道传输到文件并另存为CSV。
The CSV was checked in to Git and had been used for awhile until we had to update some parts of it by adding a new column and fixing some values.
CSV已签入Git,并使用了一段时间,直到我们不得不通过添加新列并修复一些值来更新其中的某些部分。
While we don’t know the exact reason yet, my theory is that somehow, Excel for Mac (we are all using Macs) added some additional metadata to it even after saving the file as a CSV.
尽管我们尚不知道确切原因,但我的理论是,即使将文件另存为CSV,Excel for Mac(我们都在使用Mac)也向其中添加了一些其他元数据。
This in turn made anyone using the seed receive the following error:
反过来,这使使用种子的任何人都收到以下错误:
CSV::MalformedCSVError: Illegal quoting in line 1.
I opened the CSV file and nothing looked suspicious. My first thought was some left/right quotation marks were somehow mixed into the file instead of just the ‘normal’ double quotes: "
. But upon further investigation, there was nothing out of the ordinary. This led me to just wipe out the whole file, and actually type out the first row again.
我打开了CSV文件,但没有任何可疑的地方。 我首先想到的是,文件中混入了一些左/右引号,而不仅仅是“正常”双引号: "
。但是,经过进一步的调查,发现并没有什么不寻常的地方。这导致我只消了整个内容。文件,然后再次键入第一行。
I saved that file again and ran the migration:
我再次保存该文件并运行迁移:
CSV::MalformedCSVError: Illegal quoting in line 1.
What?!
什么?!
Okay, this was driving me nuts. I opened up a new file, typed the exact single line again, and ran the migration. It worked. So what was in that file?!
好吧,这真让我发疯。 我打开了一个新文件,再次键入了确切的单行,然后运行了迁移。 有效。 那那个文件里有什么?
Only one way to find out:
只有一种方法可以找出:
cat companies.csv | pbcopy | pbpaste > temp.csv
rm companies.csv
mv temp.csv companies.csv
git diff
So OSX has these two functions that are very useful: pbcopy
and pbpaste
. Basically anything piped to pbcopy
gets into your clipboard and pbpaste
puts what you have on your clipboard to standard output (stdout). But it removes all formatting.
因此OSX具有这两个非常有用的功能: pbcopy
和pbpaste
。 基本上,通过管道传输到pbcopy
都会进入剪贴板,而pbpaste
会将剪贴板上的pbpaste
放入标准输出(stdout)。 但是它将删除所有格式。
Very useful when you want to just copy some text from somewhere and you want to paste it into a WYSIWYG editor without all the formatting. Like when writing an email from Gmail, for example.
当您只想从某处复制一些文本并将其粘贴到WYSIWYG编辑器而不使用所有格式时,此功能非常有用。 例如,从Gmail编写电子邮件时。
I then removed the original file and saved the new ‘unformatted’ file with the same file name so I could see the difference.
然后,我删除了原始文件,并使用相同的文件名保存了新的“未格式化”文件,这样我就可以看到区别。
And we finally saw the invisible man:
最后我们看到了那个看不见的人:
A quick Google search told us that our friend U+FEFF
was called a ZERO WIDTH NO-BREAK SPACE
. Also, a quick trip to Wikipedia told us about the actual uses for U+FEFF
, more commonly known as Byte order mark
or BOM
.
快速的Google搜索告诉我们,我们的朋友U+FEFF
被称为ZERO WIDTH NO-BREAK SPACE
。 另外, 快速访问Wikipedia告诉了我们U+FEFF
的实际用法,通常被称为Byte order mark
或BOM
。
Our friend FEFF
means different things, but it’s basically a signal for a program on how to read the text. It can be UTF-8
(more common), UTF-16
, or even UTF-32
.
我们的朋友FEFF
意味着不同的事情,但这基本上是一个程序如何阅读文本的信号。 它可以是UTF-8
(更常见), UTF-16
甚至UTF-32
。
FEFF
itself is for UTF-16
— in UTF-8
it is more commonly known as 0xEF,0xBB, or 0xBF
.
FEFF
本身是针对UTF-16
-在UTF-8
它通常被称为0xEF,0xBB, or 0xBF
。
From my understanding, when the CSV file was opened in Excel and saved, Excel created a space for our invisible stowaway, U+FEFF
. And in front of the file to boot!
据我了解,当在Excel中打开并保存CSV文件时,Excel为我们的隐形U+FEFF
创建了一个空间。 并在文件前面启动!
Excel did some magic, and it was probably saved in UTF-16
instead of UTF-8
. UTF-8
does not understand BOM
and just treats it as a non-character so visually, the file was okay. But Ruby’s CSV
thought that there was something wrong because it assumed the file it was reading was UTF-8
and it couldn’t ignore Mr. U+FEFF
.
Excel做了一些魔术,它可能保存在UTF-16
而不是UTF-8
。 UTF-8
不了解BOM
而只是将其视为非字符,因此从视觉上看,该文件还可以。 但是Ruby的CSV
认为出了点问题,因为它假定正在读取的文件是UTF-8
,并且不能忽略U+FEFF
先生。
So lesson learned: don’t open (and save!) a CSV file in Excel if you want to feed it to Ruby’s CSV
parser.
因此,我们汲取了教训:如果您想将其馈送到Ruby的CSV
解析器中,请不要在Excel中打开(并保存!)CSV文件。
If you do ever encounter an error like that, be sure to look for hidden characters not shown by your editor. If you still can’t see it and are using OSX, then pbcopy
and pbpaste
will help you out — they strip out any formatting or hidden characters from text in addition to copying and pasting it.
如果您确实遇到过这样的错误,请确保查找编辑器未显示的隐藏字符。 如果您仍然看不到它并使用OSX,则pbcopy
和pbpaste
将为您提供帮助-除了复制和粘贴外,它们还会从文本中删除所有格式或隐藏字符。
翻译自: https://www.freecodecamp.org/news/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7/
去除文件头部的u+feff