尝试为PDF编制索引时出现Elasticsearch Parse Exception错误

鞠通

2023-03-14

问题内容：

我刚开始使用Elasticsearch。我们的要求是我们需要索引成千上万个PDF文件，而我很难让其中一个成功地索引。

安装了附件类型插件并获得响应：Installed mapper-attachments。

遵循了“操作中的附件类型”教程，但是该过程挂起，并且 我不知道如何解释错误消息
。还尝试了悬挂在同一个地方的要点。

$ curl -X POST "localhost:9200/test/attachment/" -d json.file 
{"error":"ElasticSearchParseException[Failed to derive xcontent from (offset=0, length=9): [106, 115, 111, 110, 46, 102, 105, 108, 101]]","status":400}

更多细节：

该json.file包含一个嵌入式的Base64 PDF文件（按说明）。该文件的第一行似乎
正确（无论如何对我来说）：{"file":"JVBERi0xLjQNJeLjz9MNCjE1OCAwIG9iaiA8…

我不确定可能json.file无效，或者elasticsearch是否未正确解析PDF？！？

编码 -这是我们将PDF编码成的方式json.file（根据教程）：

coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file

还尝试了：

coded=`openssl base64 -in fn6742.pdf

日志：

[2012-06-07 12:32:16,742][DEBUG][action.index             ] [Bailey, Paul] [test][0], node[AHLHFKBWSsuPnTIRVhNcuw], [P], s[STARTED]: Failed to execute [index {[test][attachment][DauMB-vtTIaYGyKD4P8Y_w], source[json.file]}]
org.elasticsearch.ElasticSearchParseException: Failed to derive xcontent from (offset=0, length=9): [106, 115, 111, 110, 46, 102, 105, 108, 101]
    at org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:147)
    at org.elasticsearch.common.xcontent.XContentHelper.createParser(XContentHelper.java:50)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:451)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:437)
    at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:290)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:210)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:680)

希望有人可以帮助我看看我缺少或做错了什么吗？

问题答案：

以下错误指出了问题的根源。

Failed to derive xcontent from (offset=0, length=9): [106, 115, 111, 110, 46, 102, 105, 108, 101]

UTF-8代码[106、115、111，…]显示您正在尝试索引字符串“ json.file”而不是文件内容。

要索引文件的内容，只需在文件名前面添加字母“ @”。

curl -X POST "localhost:9200/test/attachment/" -d @json.file

尝试为PDF编制索引时出现Elasticsearch Parse Exception错误

相关阅读

相关文章

相关问答

相关工具

相关文档