Elasticsearch-全域值的基数

高德水

2023-03-14

问题内容：

我有一个看起来像这样的文档：

{
   "_id":"some_id_value",
   "_source":{
      "client":{
         "name":"x"
      },
      "project":{
         "name":"x November 2016"
      }
   }
}

我正在尝试执行一个查询，该查询将为我获取每个客户端的唯一项目名称的计数。对于这一点，我在查询中使用cardinality过project.name。我确定4该特定客户端只有唯一的项目名称。但是，当我运行查询时，我得到的计数5，我知道这是错误的。

项目名称全部包含客户端的名称。例如，如果客户为“ X”，则项目名称将为“ X Testing November 2016”或“ X Jan
2016”，等等。我不知道这是不是一个考虑因素。

这是文档类型的映射

{
   "mappings":{
      "vma_docs":{
         "properties":{
            "client":{
               "properties":{
                  "contact":{
                     "type":"string"
                  },
                  "name":{
                     "type":"string"
                  }
               }
            },
            "project":{
               "properties":{
                  "end_date":{
                     "format":"yyyy-MM-dd",
                     "type":"date"
                  },
                  "project_type":{
                     "type":"string"
                  },
                  "name":{
                     "type":"string"
                  },
                  "project_manager":{
                     "index":"not_analyzed",
                     "type":"string"
                  },
                  "start_date":{
                     "format":"yyyy-MM-dd",
                     "type":"date"
                  }
               }
            }
         }
      }
   }
}

这是我的搜索查询

{
   "fields":[
      "client.name",
      "project.name"
   ],
   "query":{
      "bool":{
         "must":{
            "match":{
               "client.name":{
                  "operator":"and",
                  "query":"ABC systems"
               }
            }
         }
      }
   },
   "aggs":{
      "num_projects":{
         "cardinality":{
            "field":"project.name"
         }
      }
   },
   "size":5
}

这些是我得到的结果（为简洁起见，我仅发布了2个结果）。请发现num_projects聚合返回5，但必须仅返回4，这是项目的总数。

{
   "hits":{
      "hits":[
         {
            "_score":5.8553367,
            "_type":"vma_docs",
            "_id":"AVTMIM9IBwwoAW3mzgKz",
            "fields":{
               "project.name":[
                  "ABC"
               ],
               "client.name":[
                  "ABC systems Pvt Ltd"
               ]
            },
            "_index":"vma"
         },
         {
            "_score":5.8553367,
            "_type":"vma_docs",
            "_id":"AVTMIM9YBwwoAW3mzgK2",
            "fields":{
               "project.name":[
                  "ABC"
               ],
               "client.name":[
                  "ABC systems Pvt Ltd"
               ]
            },
            "_index":"vma"
         }
      ],
      "total":18,
      "max_score":5.8553367
   },
   "_shards":{
      "successful":5,
      "failed":0,
      "total":5
   },
   "took":4,
   "aggregations":{
      "num_projects":{
         "value":5
      }
   },
   "timed_out":false
}

FYI：项目名称ABC，ABC Nov 2016，ABC retest November，ABC Mobile App

问题答案：

您需要为您的project.name字段进行以下映射：

{
  "mappings": {
    "vma_docs": {
      "properties": {
        "client": {
          "properties": {
            "contact": {
              "type": "string"
            },
            "name": {
              "type": "string"
            }
          }
        },
        "project": {
          "properties": {
            "end_date": {
              "format": "yyyy-MM-dd",
              "type": "date"
            },
            "project_type": {
              "type": "string"
            },
            "name": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "project_manager": {
              "index": "not_analyzed",
              "type": "string"
            },
            "start_date": {
              "format": "yyyy-MM-dd",
              "type": "date"
            }
          }
        }
      }
    }
  }
}

从根本上讲raw，这是一个子字段，在其中输入了相同的值project.name，project.name.raw但没有触及（进行标记或分析）。然后，您需要使用的查询是：

{
  "fields": [
    "client.name",
    "project.name"
  ],
  "query": {
    "bool": {
      "must": {
        "match": {
          "client.name": {
            "operator": "and",
            "query": "ABC systems"
          }
        }
      }
    }
  },
  "aggs": {
    "num_projects": {
      "cardinality": {
        "field": "project.name.raw"
      }
    }
  },
  "size": 5
}

Elasticsearch-全域值的基数

相关阅读

相关文章

相关问答

相关工具

相关文档