SQL问题-计算最大天数顺序

公西财

2023-03-14

问题内容：

有一个包含访问数据的表：

uid (INT) | created_at (DATETIME)

我想查找用户连续几天访问了我们的应用程序。因此，例如：

SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123

将返回：

     d      
------------
 2012-04-28
 2012-04-29
 2012-04-30
 2012-05-03
 2012-05-04

有5条记录和两个间隔-3天（4月28日至30日）和2天（5月3日至4日）。

我的问题是如何找到用户连续访问该应用程序的最大天数（在示例中为3天）。试图在SQL文档中找到合适的函数，但没有成功。我想念什么吗？

UPD：
谢谢您的回答！实际上，我正在使用vertica分析数据库（http://vertica.com/），但这是一种非常罕见的解决方案，只有很少的人有使用它的经验。尽管它支持SQL-99标准。

好吧，大多数解决方案都需要稍作修改。最后，我创建了自己的查询版本：

-- returns starts of the vitit series 
SELECT t1.d as s FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d

          s          
---------------------
 2012-04-28 01:00:00
 2012-05-03 01:00:00

-- returns end of the vitit series 
SELECT t1.d as f FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d

          f          
---------------------
 2012-04-30 01:00:00
 2012-05-04 01:00:00

因此，现在我们唯一需要做的就是以某种方式将它们连接起来，例如通过行索引。

SELECT s, f, DATEDIFF(day, s, f) + 1 as seq FROM (
    SELECT t1.d as s, ROW_NUMBER() OVER () as o1 FROM testing t1
    LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
    WHERE t2.d is null GROUP BY t1.d
) tbl1 LEFT JOIN (
    SELECT t1.d as f, ROW_NUMBER() OVER () as o2 FROM testing t1
    LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
    WHERE t2.d is null GROUP BY t1.d
) tbl2 ON o1 = o2

样本输出：

          s          |          f          | seq 
---------------------+---------------------+-----
 2012-04-28 01:00:00 | 2012-04-30 01:00:00 |   3
 2012-05-03 01:00:00 | 2012-05-04 01:00:00 |   2

问题答案：

最短的另一种方法是进行自我联接：

with grouped_result as
(
    select 
       sr.d,
       sum((fr.d is null)::int) over(order by sr.d) as group_number
    from tbl sr
    left join tbl fr on sr.d = fr.d + interval '1 day'
)
select d, group_number, count(d) over m as consecutive_days
from grouped_result
window m as (partition by group_number)

输出：

          d          | group_number | consecutive_days 
---------------------+--------------+------------------
 2012-04-28 08:00:00 |            1 |                3
 2012-04-29 08:00:00 |            1 |                3
 2012-04-30 08:00:00 |            1 |                3
 2012-05-03 08:00:00 |            2 |                2
 2012-05-04 08:00:00 |            2 |                2
(5 rows)

实时测试：http：//www.sqlfiddle.com/#！1/93789/1

SR =第二行，FR
=第一行（或者前一行？銉）。基本上，我们正在执行向后跟踪，这是数据库不支持的模拟滞后LAG（Postgres支持LAG，但解决方案很长，因为窗口不支持嵌套窗口）。因此，在此查询中，我们使用混合方法，通过联接模拟LAG，然后对其使用SUM窗口化，从而产生组号

更新

忘记了最终查询，上面的查询说明了组编号的基础，需要将其变形为：

with grouped_result as
(
    select 
       sr.d,
       sum((fr.d is null)::int) over(order by sr.d) as group_number
    from tbl sr
    left join tbl fr on sr.d = fr.d + interval '1 day'
)
select min(d) as starting_date, max(d) as end_date, count(d) as consecutive_days
from grouped_result
group by group_number
-- order by consecutive_days desc limit 1


STARTING_DATE                END_DATE                     CONSECUTIVE_DAYS
April, 28 2012 08:00:00-0700 April, 30 2012 08:00:00-0700 3
May, 03 2012 08:00:00-0700   May, 04 2012 08:00:00-0700   2

更新

我知道为什么我的其他使用窗口函数的解决方案变长了，为什么我试图说明组编号和对组进行计数的逻辑又变长了。如果我像MySql方法那样追求目标，那么该窗口函数可能会更短。话虽如此，这是我以前的窗口函数方法，尽管现在更好：

with headers as
(
    select 
      d,lag(d) over m is null or d - lag(d) over m  <> interval '1 day' as header
    from tbl
    window m as (order by d)
)      
,sequence_group as
(
    select d, sum(header::int) over (order by d) as group_number
    from headers  
)
select min(d) as starting_date,max(d) as ending_date,count(d) as consecutive_days
from sequence_group
group by group_number
-- order by consecutive_days desc limit 1

实时测试：http：//www.sqlfiddle.com/#！1/93789/21

SQL问题-计算最大天数顺序

相关阅读

相关文章

相关问答

相关工具

相关文档