用Order By计算分区中的行

裴浩歌

2023-03-14

问题内容：

我试图通过编写一些示例查询来了解postgres中的PARTITION BY。我有一个运行查询的测试表。

id integer | num integer
___________|_____________
1          | 4 
2          | 4
3          | 5
4          | 6

当我运行以下查询时，我得到了预期的输出。

SELECT id, COUNT(id) OVER(PARTITION BY num) from test;

id         | count
___________|_____________
1          | 2 
2          | 2
3          | 1
4          | 1

但是，当我将ORDER BY添加到分区时，

SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test;

id         | count
___________|_____________
1          | 1 
2          | 2
3          | 1
4          | 1

我的理解是，COUNT是在属于分区的所有行中计算的。在这里，我按 num 对行进行了分区。不论是否带有ORDER
BY子句，分区中的行数都是相同的。为什么输出会有所不同？

问题答案：

当您将an添加order by到用作窗口函数的聚合时，该聚合会变成“运行计数”（或您使用的任何聚合）。

在count(*)将返回行了数，直到基于指定的顺序对“当前”。

以下查询显示了与一起使用的聚合的不同结果order by。用sum()代替，count()比较容易看（我认为）。

with test (id, num, x) as (
  values 
    (1, 4, 1),
    (2, 4, 1),
    (3, 5, 2),
    (4, 6, 2)
)
select id, 
       num,
       x,
       count(*) over () as total_rows, 
       count(*) over (order by id) as rows_upto,
       count(*) over (partition by x order by id) as rows_per_x,
       sum(num) over (partition by x) as total_for_x,
       sum(num) over (order by id) as sum_upto,
       sum(num) over (partition by x order by id) as sum_for_x_upto
from test;

将导致：

id | num | x | total_rows | rows_upto | rows_per_x | total_for_x | sum_upto | sum_for_x_upto
---+-----+---+------------+-----------+------------+-------------+----------+---------------
 1 |   4 | 1 |          4 |         1 |          1 |           8 |        4 |              4
 2 |   4 | 1 |          4 |         2 |          2 |           8 |        8 |              8
 3 |   5 | 2 |          4 |         3 |          1 |          11 |       13 |              5
 4 |   6 | 2 |          4 |         4 |          2 |          11 |       19 |             11

Postgres手册中还有更多示例

用Order By计算分区中的行

相关阅读

相关文章

相关问答

相关工具

相关文档