按照下面的操作进行Python的OCR文字识别(识别PDF文字)
https://pythontips.com/2016/02/25/ocr-on-pdf-files-using-python/
http://blog.topspeedsnail.com/archives/3571
运行出现错误
wand.exceptions.PolicyError: not authorized `/tmp/xxx.pdf' @ .......
wand未授权,这是wand所调用ImageMagick的配置问题,需要修改/etc/ImageMagick-6/policy.xml文件
sudo vi /etc/ImageMagick-6/policy.xml
找到
<policy domain="coder" rights="none" pattern="PDF" />
修改为
<policy domain="coder" rights="read|write" pattern="PDF" />
除了PDF,其它类型文件也会出现这种错误,修改相应的条目就好了
<policy domain="cache" name="shared-secret" value="passphrase"/>
<policy domain="coder" rights="none" pattern="EPHEMERAL" />
<policy domain="coder" rights="none" pattern="URL" />
<policy domain="coder" rights="none" pattern="HTTPS" />
<policy domain="coder" rights="none" pattern="MVG" />
<policy domain="coder" rights="none" pattern="MSL" />
<policy domain="coder" rights="none" pattern="TEXT" />
<policy domain="coder" rights="none" pattern="SHOW" />
<policy domain="coder" rights="none" pattern="WIN" />
<policy domain="coder" rights="none" pattern="PLT" />
<policy domain="path" rights="none" pattern="@*" />
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="read|write" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
参考:
https://stackoverflow.com/questions/53442110/convert-not-authorized-tieps-error-constitute-c-readimage-412
https://blog.csdn.net/m0_37566460/article/details/86007287