python中textract的用法_如何使用Python中的Textract库加载unicode中的字符串？

何宏博

2023-12-01

我使用的是Textract，对Python来说比较陌生，我希望加载unicode字符串而不是utf-8格式的文件。有办法吗？在

我试过了text = textract.process(file)

但这加载了一个UTF-8字符串，而我更喜欢unicode。我试着用

^{pr2}$

但这会带来一个错误。在Error

Traceback (most recent call last):

File "/home/moha/dev/intellij-ws/pyqadi/tests/test_file2txt.py", line 11, in test_process

str=f2t.to_txt(file)

File "/home/moha/dev/intellij-ws/pyqadi/textsearcher/file2txt.py", line 10, in to_txt

text = textract.process(file, encoding="unicode")

File "/usr/local/lib/python2.7/dist-packages/textract/parsers/__init__.py", line 57, in process

return parser.process(filename, encoding, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 46, in process

return self.encode(unicode_string, encoding)

File "/usr/local/lib/python2.7/dist-packages/textract/parsers/utils.py", line 31, in encode

return text.encode(encoding, 'ignore')

LookupError: unknown encoding: unicode