Elasticsearch不同的过滤器值

壤驷凯

2023-03-14

问题内容：

我在elasticsearch中拥有大型文档存储，并希望检索不同的过滤器值以显示在HTML下拉列表中。

一个例子是像

[
    {
        “ name”：“ John Doe”，
        “部门”：[
            {
                “ name”：“帐户”
            }，
            {
                “名称”：“管理”
            }
        ]
    }，
    {
        “名称”：“简史密斯”，
        “部门”：[
            {
                “名称”：“ IT”
            }，
            {
                “名称”：“管理”
            }
        ]
    }
]

下拉列表应包含部门列表，即IT，客户和管理部门。

请问有什么好心的人向我指出正确的方向，以便从Elasticsearch检索不同的部门列表吗？

谢谢

问题答案：

这是terms聚合（文档）的工作。

您可以使用以下不同的departments值：

POST company/employee/_search
{
  "size":0,
  "aggs": {
    "by_departments": {
      "terms": {
        "field": "departments.name",
        "size": 0 //see note 1
      }
    }
  }
}

在您的示例中，输出：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "management", //see note 2
               "doc_count": 2
            },
            {
               "key": "accounts",
               "doc_count": 1
            },
            {
               "key": "it",
               "doc_count": 1
            }
         ]
      }
   }
}

另外两个注意事项：

设置size为0会将最大存储桶数设置为Integer.MAX_VALUE。如果有太多departments不同的值，请不要使用它。
您会看到这些键是terms分析departments值的结果。确保terms在映射为的字段上使用汇总not_analyzed。

例如，使用我们的默认映射（departments.name是一个analyzed字符串），添加此员工：

{
  "name": "Bill Gates",
  "departments": [
    {
      "name": "IT"
    },
    {
      "name": "Human Resource"
    }
  ]
}

会导致这种结果：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "it",
               "doc_count": 2
            },
            {
               "key": "management",
               "doc_count": 2
            },
            {
               "key": "accounts",
               "doc_count": 1
            },
            {
               "key": "human",
               "doc_count": 1
            },
            {
               "key": "resource",
               "doc_count": 1
            }
         ]
      }
   }
}

使用正确的映射：

POST company
{
  "mappings": {
    "employee": {
      "properties": {
        "name": {
          "type": "string"
        },
        "departments": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

相同的请求最终输出：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "IT",
               "doc_count": 2
            },
            {
               "key": "Management",
               "doc_count": 2
            },
            {
               "key": "Accounts",
               "doc_count": 1
            },
            {
               "key": "Human Resource",
               "doc_count": 1
            }
         ]
      }
   }
}

希望这可以帮助！

Elasticsearch不同的过滤器值

相关阅读

相关文章

相关问答

相关工具

相关文档