问题：

在弹性搜索中使用术语频率获取聚合

寿高阳

2023-03-14

这是我的ES查询：

==创建索引===

PUT /sample

===插入数据===

PUT /sample/docs/1
{"data": "And the world said, 'Disarm, disclose, or face serious consequences'—and therefore, we worked with the world, we worked to make sure that Saddam Hussein heard the message of the world."}
PUT /sample/docs/2
{"data": "Never give in — never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force; never yield to the apparently overwhelming might of the enemy"}

==获取结果的查询===

POST sample/docs/_search
{
  "query": {
    "match": {
      "data": "never"
    }
  },
  "highlight": {
    "fields": {
      "data":{}
    }
  }
}

==检索到的结果===

...
        "highlight": {
          "data": [
            "<em>Never</em> give in — <em>never</em>, <em>never</em>, <em>never</em>, <em>never</em>, in nothing great or small, large or petty, <em>never</em> give",
            " in except to convictions of honour and good sense. <em>Never</em> yield to force; <em>never</em> yield to the apparently overwhelming might of the enemy"
          ]
        }

==期望结果===

所需术语按文档搜索术语的频率，如下所示

Doc Id: 2
Term Frequency :{
    "never": 8
}

我尝试过Bucket聚合、Terms聚合和其他聚合，但没有得到这个结果。

提前谢谢你的帮助！

共有1个答案

姚烨

2023-03-14

您应该使用术语向量，它支持基于频率查询特定术语。

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

在这种情况下，您的查询将是

GET /sample/docs/_termvectors
{
    "doc": {
      "data": "never"
    },
    "term_statistics" : true,
    "field_statistics" : true,
    "positions": false,
    "offsets": false,
    "filter" : {
      "min_term_freq" : 8
    }
}

类似资料：

在弹性搜索中使用*（asterix）作为术语查询

我需要将属性设置为not analysis以便弹性搜索不会删除标点符号等。
使用聚合检索查询结果中术语的文档频率

对于我对ElasticSearch的一些查询，我希望返回三条信息：结果文档集中出现了哪些术语T？ T的每个元素在结果文档集中出现的频率是多少？ T的每个元素在整个索引（-- 使用缺省术语facet或现在的术语aggregation方法可以很容易地确定第一点。所以我的问题其实是关于第三点。在ElasticSearch 1.x之前，即在切换到“聚合”范式之前，我可以使用一个term facet，将“
弹性搜索嵌套TopHits聚合

我已经为一个问题挣扎了一段时间，所以我想我应该通过stackoverflow来解决这个问题。 “我的文档类型”有一个标题、一个语言字段（用于筛选）和一个分组id字段（我省略了所有其他字段以保持重点）搜索文档时，我希望找到包含标题中文本的所有文档。对于每个唯一的分组id，我只需要一个文档。我一直在关注tophits聚合，从我所看到的情况来看，它应该能够解决我的问题。对我的索引运行此查询时：我
使用弹性搜索聚类的Bing地图

有没有办法将弹性搜索GeoHash转换为具有适当缩放级别的bing地图图钉？ https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html
弹性搜索范围和术语过滤器

我正在尝试在Elastic 2中创建一个过滤索引别名。十、以下是所有青少年的资料，不分性别。我只想在这个过滤器里看到雌性。这就是我试图创建索引别名的原因：我看了这个问题，似乎是相同的答案，但是我的JSON一定有问题。 Elasticsearch将范围和术语连接到相同的数组项
从Lucene索引中获得频率最高的术语

问题内容：我需要从多个Lucene索引中提取频率最高的术语，以将其用于某些语义分析。因此，我想获取可能出现次数最多的前30个词（仍未决定阈值，我将分析结果）及其按索引计数。我知道由于故意删除重复项，我可能会失去一些精度，但是就目前而言，我可以接受。因此，对于所提出的解决方案，（不必说可能）速度并不重要，因为我会进行静态分析，所以我会强调实现的简便性，因为我不太了解Lucene，也

在弹性搜索中使用术语频率获取聚合

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档