计算data_histogram聚合中的差异

寿鸣

2023-03-14

问题内容：

我要分析一组客户。我对客户增长感兴趣，例如：

自上周以来增加了43位新客户（+ 32％）
自去年以来+12650（+ 1140％）新客户

该怎么办：

获得本周创造的客户
获取上周创建的客户
数他们
计算差异（百分比）

因此，首先，我将创建一个直方图，按周对客户进行分类：

{  
  "aggs":{  
    "customers_over_time":{  
      "date_histogram":{  
        "field":"created",
        "interval":"week"
      }
    }
  }
}

这例如导致

{  
  "buckets":[  
    ...,
    {  
      "key_as_string":"2018-10-01T00:00:00.000Z",
      "key":1538352000000,
      "doc_count":1
    },
    {  
      "key_as_string":"2018-10-08T00:00:00.000Z",
      "key":1538956800000,
      "doc_count":7
    },
    {  
      "key_as_string":"2018-10-15T00:00:00.000Z",
      "key":1539561600000,
      "doc_count":5
    }
  ]
}

然后，我只需要获取最后两个条目并计算差异，然后将其分配给buckets集合之外的字段。在Elasticsearch中是否可能，可能是通过 Bucket
Script Aggregation
？

另一个想法是进行一些优化，并仅为有限数量的客户创建直方图。我试过了：

{  
  "query":{  
    "range":{  
      "created":{  
        "gte":"now-1w",
        "lte":"now"
      }
    }
  }
}

但是，这并不考虑整个上周，而仅考虑了最近7天，这与上周不同。有没有办法让客户在本周和上周创建？

问题答案：

好吧，我已经尝试过一些东西，希望对您有用。我已经利用的 序列差异汇总
功能，Elasticsearch您可以参考此链接以获取更多详细信息。

假设我有three本周的文件，即week starting from 2018-10-15只有one上周的文件，即week starting from 2018-10-08

在一周内创建的用户的差异2018-10-15会2

以下是我提出的示例查询，该查询将向您显示计数与上周的差异。

询问

POST testdateindex/_search
{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : {
        "range" : {
          "created" : {
            "from":"now-2w",
            "to":"now",
            "include_lower" : true,
            "include_upper" : true
          }
        }
      }
    }
  },
  "aggs": {
    "customers_over_time": {
      "date_histogram": {
        "field": "created",
        "interval": "week"
      },
      "aggs": {
            "difference": {
               "serial_diff": {                
                  "buckets_path": "_count",
                  "lag" : 1
               }
            }
         }
    }
  }
}

我使用了lagas，1因为在这种情况下，您只需要连续两个星期或每个时段之间存在差异即可。

查询结果：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "customers_over_time": {
      "buckets": [
        {
          "key_as_string": "2018-10-08T00:00:00.000Z",
          "key": 1538956800000,
          "doc_count": 1
        },
        {
          "key_as_string": "2018-10-15T00:00:00.000Z",
          "key": 1539561600000,
          "doc_count": 3,
          "difference": {
            "value": 2
          }
        }
      ]
    }
  }
}

结果将显示该周所有文档的计数以及difference上面的json部分，该计数将保留与上周的计数差异。

请注意，第一个存储桶没有，difference因为那一周之前我还没有创建任何文档。

希望能帮助到你！

计算data_histogram聚合中的差异

询问

查询结果：

相关阅读

相关文章

相关问答

相关工具

相关文档