使用int8range连接2个大型postgres表无法很好地扩展

楚奇逸

2023-03-14

问题内容：

我想将IP路由表信息加入IP Whois信息。我正在使用Amazon的RDS，这意味着我不能使用Postgres
ip4r扩展，因此我改用int8range类型来表示带有gist索引的IP地址范围。

我的表如下所示：

=> \d routing_details
     Table "public.routing_details"
  Column  |   Type    | Modifiers
----------+-----------+-----------
 asn      | text      |
 netblock | text      |
 range    | int8range |
Indexes:
    "idx_routing_details_netblock" btree (netblock)
    "idx_routing_details_range" gist (range)


=> \d netblock_details
    Table "public.netblock_details"
   Column   |   Type    | Modifiers
------------+-----------+-----------
 range      | int8range |
 name       | text      |
 country    | text      |
 source     | text      |
Indexes:
    "idx_netblock_details_range" gist (range)

完整的routing_details表包含不到60万行，而netblock_details表包含约825万行。两个表中都有重叠的范围，但是对于routing_details表中的每个范围，我想从netblock_details表中获得最佳（最小）匹配。

我提出了2种不同的查询，我认为它们会返回准确的数据，一种使用窗口函数，另一种使用DISTINCT ON：

EXPLAIN SELECT DISTINCT ON (r.netblock) *
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                              QUERY PLAN
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=118452809778.47..118477166326.22 rows=581300 width=91)
   Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
   ->  Sort  (cost=118452809778.47..118464988052.34 rows=4871309551 width=91)
         Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         ->  Nested Loop  (cost=0.00..115920727265.53 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, (upper(n.range) - lower(n.range))
               Join Filter: (r.range <@ n.range)
               ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                     Output: r.asn, r.netblock, r.range
               ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                     Output: n.range, n.name, n.country
                     ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                           Output: n.range, n.name, n.country
(14 rows)               ->  Seq Scan on netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)


EXPLAIN VERBOSE SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY r.range ORDER BY UPPER(n.range) - LOWER(n.range)) AS rank
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
) a WHERE rank = 1 ORDER BY netblock;

                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=118620775630.16..118620836521.53 rows=24356548 width=99)
   Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
   Sort Key: a.netblock
   ->  Subquery Scan on a  (cost=118416274956.83..118611127338.87 rows=24356548 width=99)
         Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
         Filter: (a.rank = 1)
         ->  WindowAgg  (cost=118416274956.83..118550235969.49 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, row_number() OVER (?), ((upper(n.range) - lower(n.range))), r.range
               ->  Sort  (cost=118416274956.83..118428453230.71 rows=4871309551 width=91)
                     Output: ((upper(n.range) - lower(n.range))), r.range, r.asn, r.netblock, n.range, n.name, n.country
                     Sort Key: r.range, ((upper(n.range) - lower(n.range)))
                     ->  Nested Loop  (cost=0.00..115884192443.90 rows=4871309551 width=91)
                           Output: (upper(n.range) - lower(n.range)), r.range, r.asn, r.netblock, n.range, n.name, n.country
                           Join Filter: (r.range <@ n.range)
                           ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                                 Output: r.asn, r.netblock, r.range
                           ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                                 Output: n.range, n.name, n.country
                                 ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                                       Output: n.range, n.name, n.country
(20 rows)

DISTINCT
ON似乎更有效率，因此我继续进行该操作。当我对整个数据集运行查询时，在等待约24小时后，出现磁盘空间不足错误。我创建了一个routing_details_small表，其中包含完整routing_details表的N行的子集，以尝试了解发生了什么。

N = 1000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=4411888.68..4453012.20 rows=999 width=90) (actual time=124.094..133.720 rows=999 loops=1)
   ->  Sort  (cost=4411888.68..4432450.44 rows=8224705 width=90) (actual time=124.091..128.560 rows=4172 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external sort  Disk: 608kB
         ->  Nested Loop  (cost=0.41..1780498.29 rows=8224705 width=90) (actual time=0.080..101.518 rows=4172 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..20.00 rows=1000 width=42) (actual time=0.007..1.037 rows=1000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..1307.55 rows=41124 width=48) (actual time=0.063..0.089 rows=4 loops=1000)
                     Index Cond: (r.range <@ range)
 Total runtime: 134.999 ms
(9 rows)

N = 100000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=654922588.98..659034941.48 rows=200 width=144) (actual time=28252.677..29487.380 rows=98992 loops=1)
   ->  Sort  (cost=654922588.98..656978765.23 rows=822470500 width=144) (actual time=28252.673..28926.703 rows=454856 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 64488kB
         ->  Nested Loop  (cost=0.41..119890431.75 rows=822470500 width=144) (actual time=0.079..24951.038 rows=454856 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..1935.00 rows=100000 width=96) (actual time=0.007..110.457 rows=100000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..725.96 rows=41124 width=48) (actual time=0.067..0.235 rows=5 loops=100000)
                     Index Cond: (r.range <@ range)
 Total runtime: 29596.667 ms
(9 rows)

N = 250000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1651822953.55..1662103834.80 rows=200 width=144) (actual time=185835.443..190143.266 rows=247655 loops=1)
   ->  Sort  (cost=1651822953.55..1656963394.18 rows=2056176250 width=144) (actual time=185835.439..188779.279 rows=1103850 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 155288kB
         ->  Nested Loop  (cost=0.28..300651962.46 rows=2056176250 width=144) (actual time=19.325..177403.913 rows=1103850 loops=1)
               ->  Seq Scan on netblock_details n  (cost=0.00..163743.05 rows=8224705 width=48) (actual time=0.007..8160.346 rows=8224705 loops=1)
               ->  Index Scan using idx_routing_details_small_range on routing_details_small r  (cost=0.28..22.16 rows=1250 width=96) (actual time=0.018..0.018 rows=0 loops=8224705)
                     Index Cond: (range <@ n.range)
 Total runtime: 190413.912 ms
(9 rows)

针对具有60万行的完整表，查询在24小时后失败，并显示有关磁盘空间不足的错误，这大概是由外部合并步骤引起的。因此，使用一个较小的routing_details表，此查询可以很好且快速地工作，但伸缩性很差。

有关如何改善查询，甚至可能进行模式更改的建议，以使该查询在整个数据集上有效运行？

问题答案：

我原本是想像其他建议的方法那样使用横向联接（例如，Erwin
Brandstetter的最后一个查询，他使用简单的int8数据类型和简单的索引），但是不想在答案中写出来，因为我认为它不是真的有效。当您尝试查找netblock覆盖给定范围的所有范围时，索引并没有太大帮助。

我将在此处重复Erwin Brandstetter的查询：

SELECT *  -- only select columns you need to make it faster
FROM   routing_details r
     , LATERAL (
   SELECT *
   FROM   netblock_details n
   WHERE  n.ip_min <= r.ip_min
   AND    n.ip_max >= r.ip_max
   ORDER  BY n.ip_max - n.ip_min
   LIMIT  1
   ) n;

当您在netblock_details上具有索引时，如下所示：

CREATE INDEX netblock_details_ip_min_max_idx ON netblock_details 
(ip_min, ip_max DESC NULLS LAST);

您可以O(logN)在netblock_details表中快速找到扫描的起点-
最大值n.ip_min小于r.ip_min，或最小值n.ip_max大于r.ip_max。但是随后您必须扫描/读取索引/表的其余部分，并对每一行执行检查的第二部分并过滤掉大多数行。

换句话说，该索引有助于快速找到满足第一个搜索条件的起始行：n.ip_min <= r.ip_min，但随后您将继续读取满足该条件的所有行，并对每个这样的行执行第二个检查n.ip_max >= r.ip_max。平均而言（如果数据分布均匀），您将必须读取netblock_details表的一半行。Optimizer可能选择先使用索引来搜索`n.ip_max

= r.ip_max，然后再应用第二个过滤器n.ip_min <= r.ip_min`，但是您不能使用此索引将两个过滤器一起应用。

最终结果：对于中的每一行，routing_details我们将从中读取一半的行netblock_details。600K * 4M =
2,400,000,000,000行它比笛卡尔乘积好2倍。您可以EXPLAIN ANALYZE在问题的输出中看到此数字（笛卡尔乘积）。

592,496 * 8,221,675 = 4,871,309,550,800

难怪在尝试实现和排序时查询会用完磁盘空间。

获得最终结果的总体高层流程：

加入两个表（尚未找到最匹配的表）。在最坏的情况下，它是笛卡尔积，在最好的情况下，它的覆盖范围是从netblock_details到routing_details。您说过，其中netblock_details的每个条目都有多个条目routing_details，范围从3到大约10。因此，此连接的结果可能有〜6M行（不是太多）
按routing_details范围对连接的结果进行排序/分区，并为每个这样的范围找到的最佳（最小）覆盖范围netblock_details。

我的想法是反转查询。我建议不要netblock_details从较小的routing_details表的每一行中找到较大的覆盖范围，而建议从较小routing_details的每一行中找到较小的所有覆盖范围netblock_details。

两步过程

对于较大的每一行，请netblock_details查找routing_details该netblock范围内的所有范围。

我将在查询中使用以下架构。我已经将主键添加ID到两个表中。

CREATE TABLE routing_details (
ID        int
,ip_min   int8
,ip_max   int8
,asn      text
,netblock text
);

CREATE TABLE netblock_details (
ID        int
,ip_min   int8
,ip_max   int8
,name     text
,country  text
,source   text
);

SELECT
    netblock_details.ID AS n_ID
    ,netblock_details.ip_max - netblock_details.ip_min AS n_length
    ,r.ID AS r_ID
FROM
    netblock_details
    INNER JOIN LATERAL
    (
        SELECT routing_details.ID
        FROM routing_details
        WHERE
            routing_details.ip_min >= netblock_details.ip_min
            AND routing_details.ip_min <= netblock_details.ip_max
            -- note how routing_details.ip_min is limited from both sides
            -- this would make it possible to scan only (hopefully) small
            -- portion of the table instead of full or half table
            AND routing_details.ip_max <= netblock_details.ip_max
            -- this clause ensures that the whole routing range
            -- is inside the netblock range
    ) AS r ON true

我们需要onrouting_details上的索引(ip_min, ip_max)。这里最主要的是on的索引ip_min。在索引中有第二列可以消除对值的查找的需求ip_max。它对树搜索没有帮助。

请注意，横向子查询没有LIMIT。这不是最终结果。该查询应返回〜6M行。将此结果保存在临时表中。将索引添加到上的临时表(r_ID, n_length, n_ID)。n_ID再次只是为了删除多余的查询。我们需要一个索引，以避免对每个数据进行排序r_ID。

最后一步

对于中的每一行，请routing_details找到n_ID最小的n_length。现在，我们可以按“适当”顺序使用横向连接。该temp表是上一步
带有索引的结果 。

SELECT
    routing_details.*
    ,t.n_ID
    ,netblock_details.*
FROM
    routing_details
    INNER JOIN LATERAL
    (
        SELECT temp.n_ID
        FROM temp
        WHERE temp.r_ID = routing_details.ID
        ORDER BY temp.n_length
        LIMIT 1
    ) AS t ON true
    INNER JOIN netblock_details ON netblock_details.ID = t.n_ID

在这里，子查询应该是索引的搜寻，而不是扫描。Optimizer应该足够聪明以仅执行搜索并由于LIMIT 1会返回第一个发现的结果，因此您将在6M行临时表中拥有600K的索引搜索。

原始答案（我仅将其用于范围图）：

你可以“作弊”吗？

如果我正确理解了您的查询，则routing_details.range
希望为每个查询找到一个netblock_details.range覆盖范围大于/大于的最小值routing_details.range。

给定以下示例，其中r是路由范围，n1, ..., n8是netblock范围，正确的答案是n5。

   |---|
   n1

     |------------------|
     n2

                           |---------------|
                           n3

                                          |-----|
                                          n4

                  |------------------|
                  n5

                     |--------------------------------------|
                     n6

        |---------------------------|
        n7

                      |-----|
                      n8

                      |------------|
                      r
                     start       end

n.start <= r.start AND n.end >= r.end
order by n.length
limit 1

您的查询花费了14个小时，显示索引扫描花费了6ms，但按范围长度排序花费了80ms。

通过这种搜索，没有简单的数据一维排序。您n.start与n.end和一起使用n.length。但是，也许您的数据不是那么通用，或者可以返回一些不同的结果。

例如，如果可以返回n6结果，则可以更快地工作。n6是具有最大范围的覆盖范围start：

n.start <= r.start AND n.end >= r.end
order by n.start desc
limit 1

或者，您可以选择n7，它的大小最小end：

n.start <= r.start AND n.end >= r.end
order by n.end
limit 1

这种搜索将在n.start（或n.end）上使用简单的索引，而无需进行额外的排序。

第二种完全不同的方法。范围有多大/多长？如果它们相对较短（少量数字），那么您可以尝试将它们存储为整数的显式列表，而不是范围。例如，范围[5-8]将存储为4行：(5, 6, 7, 8)。使用此存储模型，可能更容易找到范围的交集。

使用int8range连接2个大型postgres表无法很好地扩展

相关阅读

相关文章

相关问答

相关工具

相关文档