在Elasticsearch中查找不同的内部对象

宋唯

2023-03-14

问题内容：

我们正在尝试在Elasticsearch中找到不同的内部对象。这将是我们案例的最小示例。我们一直坚持下面的映射（更改类型或索引或添加新字段不会有问题，但结构应保持原样）：

{
  "building": {
    "properties": {
      "street": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
      "house number": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
      "city": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
      "people": {
        "type": "object",
        "store": "yes",
        "index": "not_analyzed",
        "properties": {
          "firstName": {
            "type": "string",
            "store": "yes",
            "index": "not_analyzed"
          },
          "lastName": {
            "type": "string",
            "store": "yes",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}

假设我们有以下示例数据：

{
  "buildings": [
    {
      "street": "Baker Street",
      "house number": "221 B",
      "city": "London",
      "people": [
        {
          "firstName": "John",
          "lastName": "Doe"
        },
        {
          "firstName": "Jane",
          "lastName": "Doe"
        }
      ]
    },
    {
      "street": "Baker Street",
      "house number": "5",
      "city": "London",
      "people": [
        {
          "firstName": "John",
          "lastName": "Doe"
        }
      ]
    },
    {
      "street": "Garden Street",
      "house number": "1",
      "city": "London",
      "people": [
        {
          "firstName": "Jane",
          "lastName": "Smith"
        }
      ]
    }
  ]
}

当查询街道“贝克街”（以及所需的任何其他选项）时，我们希望获得以下列表：

[
    {
      "firstName": "John",
      "lastName": "Doe"
    },
    {
      "firstName": "Jane",
      "lastName": "Doe"
    }
]

格式并不重要，但是我们应该能够解析名字和姓氏。只是，由于我们的实际数据集要大得多，因此我们需要使输入项不同。

我们正在使用Elasticsearch 1.7。

问题答案：

我们终于解决了我们的问题。

我们的解决方案是（如我们预期的那样）一个预先计算的people_all字段。但是在导入数据时，我们正在编写其他字段，而不是使用copy_toor
transform而是在编写它。该字段如下所示：

"people": {
  "type": "nested",
  ..
  "properties": {
    "firstName": {
      "type": "string",
      "store": "yes",
      "index": "not_analyzed"
    },
    "lastName": {
      "type": "string",
      "store": "yes",
      "index": "not_analyzed"
    },
    "people_all": {
      "type": "string",
      "index": "not_analyzed"
    }
  }
}

请"index": "not_analyzed"在people_all现场注意。这对于拥有完整的存储桶很重要。如果您不使用它，我们的示例将返回3个存储桶“ john”，“
jane”和“ doe”。

编写完这个新字段后，我们可以进行如下操作：

{
  "size": 0,
  "query": {
    "term": {
      "street": "Baker Street"
    }
  },
  "aggs": {
    "people_distinct": {
      "nested": {
        "path": "people"
      },
      "aggs": {
        "people_all_distinct": {
          "terms": {
            "field": "people.people_all",
            "size": 0
          }
        }
      }
    }
  }
}

我们返回以下响应：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "people_distinct": {
      "doc_count": 3,
      "people_name_distinct": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "John Doe",
            "doc_count": 2
          },
          {
            "key": "Jane Doe",
            "doc_count": 1
          }
        ]
      }
    }
  }
}

现在，在响应中，我们可以创建不同的人员对象。

请让我们知道是否有更好的方法来实现我们的目标。
解析存储桶不是最佳解决方案，firstName并且lastName在每个存储桶中都包含字段会更加有趣。

在Elasticsearch中查找不同的内部对象

相关阅读

相关文章

相关问答

相关工具

相关文档