我使用的是Textract,对Python来说比较陌生,我希望加载unicode字符串而不是utf-8格式的文件。有办法吗?在
我试过了text = textract.process(file)
但这加载了一个UTF-8字符串,而我更喜欢unicode。我试着用
^{pr2}$
但这会带来一个错误。在Error
Traceback (most recent call last):
File "/home/moha/dev/intellij-ws/pyqadi/tests/test_file2txt.py", line 11, in test_process
str=f2t.to_txt(file)
File "/home/moha/dev/intellij-ws/pyqadi/textsearcher/file2txt.py", line 10, in to_txt
text = textract.process(file, encoding="unicode")
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/__init__.py", line 57, in process
return parser.process(filename, encoding, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 46, in process
return self.encode(unicode_string, encoding)
File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 31, in encode
return text.encode(encoding, 'ignore')
LookupError: unknown encoding: unicode