问题：

具有筛选器匹配的Elasticsearch聚合

公西岳

2023-03-14

我有一个包含嵌套文档集合的文档：

{
  "_source": {
    ...
    "groups": [
      {
        "group_id": 100,
        "parent_group_id": 1,
        "title": "Wheel",
        "parent_group_title": "Parts"
      },
      {
        "group_id": 200,
        "parent_group_id": 2,
        "title": "Seat",
        "parent_group_title": "Parts"
      }
    ]
    ...
  }
}

映射如下所示：

js prettyprint-override">{
  ...,

  "groups": {
    "type": "nested",
    "properties": {
      "group_id": {
        "type": "long"
      },
      "title": {
        "type": "text",
        "analyzer": "my_custom_analyzer",
        "term_vector": "with_positions_offsets",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "parent_group_id": {
        "type": "long"
      },
      "parent_group_title": {
        "type": "text",
        "analyzer": "my_custom_analyzer",
        "term_vector": "with_positions_offsets",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  },

  ...
}

我想做的是下一个聚合：

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "groups",
            "query": {
              "match": {
                "groups.title": {
                  "query": "whe"
                }
              }
            }
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "groups",
                "query": {
                  "match": {
                    "groups.title": {
                      "query": "whe"
                    }
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "groups": {
          "nested": {
            "path": "groups"
          },
          "aggs": {
            "titles": {
              "terms": {
                "field": "groups.title.keyword",
                "size": 5
              },
              "aggs": {
                "parents": {
                  "terms": {
                    "field": "groups.parent_group_title.keyword",
                    "size": 3
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

通过这样的查询，我得到的结果如下所示：

  "aggregations" : {
    "filtered" : {
      "doc_count" : ...,
      "groups" : {
        "doc_count" : ...,
        "titles" : {
          "doc_count_error_upper_bound" : ...,
          "sum_other_doc_count" : ...,
          "buckets" : [
            {
              "key" : "Seat",
              "doc_count" : 10,
              "parents" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 10,
                "buckets" : [
                  {
                    "key" : "Parts",
                    "doc_count" : 6
                  },
                  {
                    "key" : "Other",
                    "doc_count" : 4
                  }
                ]
              }
            },
            {
              "key" : "Wheel",
              "doc_count" : 3,
              "parents" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 3,
                "buckets" : [
                  {
                    "key" : "Parts",
                    "doc_count" : 2
                  },
                  {
                    "key" : "Other",
                    "doc_count" : 1
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }

但我想要的是，只有带有键轮的结果才会出现在结果存储桶中（或与whe搜索字符串匹配的任何其他结果）。

希望问题足够清楚。我做错了什么？是否有任何建议或更改数据结构或查询？

UPD：添加my\u custom\u analyzer以供参考：

{
  "my_custom_analyzer": {
    "type": "custom",
    "tokenizer": "ngram",
    "filter": [
      "lowercase",
      "asciifolding"
    ],
    "char_filter": [
      "html_strip"
    ],
    "min_gram": 2,
    "max_gram": 15,
    "token_chars": [
      "letter",
      "digit"
    ]
  }
}

共有1个答案

连正信

2023-03-14

您可能希望在组之前进行筛选。标题组。这意味着您根本不需要顶级查询，也不需要过滤级查询。

我没有您的my\u custom\u analyzer，因此我使用了基本匹配，但您可以了解要点：

GET groups/_search
{
  "size": 0,
  "aggs": {
    "groups": {
      "nested": {
        "path": "groups"
      },
      "aggs": {
        "titles": {
          "filter": {
            "match": {
              "groups.title": {
                "query": "wheel"
              }
            }
          },
          "aggs": {
            "group_title_terms": {
              "terms": {
                "field": "groups.title.keyword",
                "size": 5
              },
              "aggs": {
                "parents": {
                  "terms": {
                    "field": "groups.parent_group_title.keyword",
                    "size": 3
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

更新：

您的分析器有一个问题--让我们使用analyze来确定如何将标记化：

GET groups/_analyze
{
  "text": "whe",
  "analyzer": "my_custom_analyzer"
}

顺从的

{
  "tokens" : [
    {
      "token" : "w",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "wh",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "h",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "he",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "e",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 4
    }
  ]
}

我怀疑根据令牌，座位是否匹配。

我的建议是使用edge\u ngram而不是n\u gram，如下所示：

PUT groups
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "char_filter": [
            "html_strip"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "groups": {
        "type": "nested",
        "properties": {
          "group_id": {
            "type": "long"
          },
          "title": {
            "type": "text",
            "analyzer": "my_custom_analyzer",
            "term_vector": "with_positions_offsets",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "parent_group_id": {
            "type": "long"
          },
          "parent_group_title": {
            "type": "text",
            "analyzer": "my_custom_analyzer",
            "term_vector": "with_positions_offsets",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

应用映射，重新索引

类似资料：

ElasticSearch聚合：每个聚合排除一个筛选器

我想过滤掉字段'a'等于'a'的文档，同时我想对字段'a'进行刻面处理，当然不包括前面的过滤器。我知道您可以将筛选器放在查询的“外部”，以便在不应用该筛选器的情况下获得方面，例如：弹性搜索索尔尔也就是说，对于方面A，我希望保留除A：A以外的所有过滤器，对于方面B，我希望保留除B：B以外的所有过滤器，以此类推。最明显的方法是执行n个查询（n个方面中的每一个），但我不想这样做。
ElasticSearch bool should_not筛选器

我是elasticsearch的新手，所以我的问题是：提前致谢：）
匹配具有相同发音elasticsearch的单词

问题内容：我想匹配拼写不同但发音相同的单词。像“邮件”和“男性”，“飞机”和“普通”。我们可以在Elasticsearch中进行这样的匹配吗？问题答案：您可以为此使用语音令牌过滤器。语音令牌过滤器是一个插件，需要单独的安装和设置。您可以使用此博客，该博客详细说明了如何设置和使用语音令牌过滤器。
具有模式/匹配器的IllegalStateException

问题内容：我使用Java中的正则表达式来捕获组，即使我知道表达式匹配，它也会不断抛出一个。这是我的代码：我期待是因为在正则表达式的捕获组拍摄的，而是我得到： IllegalStateException：找不到匹配项我也尝试过，但发生相同的错误。根据该文件，并：捕获组从左到右从一个索引开始。零组表示整个模式，因此表达式等于。我究竟做错了什么？问题答案：是帮助程序类，它处理数据迭代以
Elasticsearch嵌套对象筛选器

我是Elasticsearch的新手，我试图创建一个过滤器来检索具有特定属性的文档。属性在映射中定义为嵌套对象，如下所示：我试图以以下形式执行一个复杂的查询：这是elasticsearch 2.x。我做错了什么？
ElasticSearch嵌套范围筛选器

我试图构造一个ElasticSearch查询，但没有得到预期的结果。任何帮助都将不胜感激！映射详细信息： null null 目前，每个嵌套的轮班文档都包含一个嵌套的calendarBlock文档，其中包含开始和结束日期时间字段，以及一个可以注册该轮班的最大志愿者人数字段。查询我试图构造的查询是经过筛选的查询。从Web上的窗体传入查询字符串。然后，我需要以编程方式将至少三个筛选器附加到这个查

具有筛选器匹配的Elasticsearch聚合

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档