当前位置: 首页 > 知识库问答 >
问题:

docker-compose up airflow-init挂起:容器之间没有网络连接

谈炳
2023-03-14

我正在尝试使用docker-compose来设置气流实例,如官方文档中所述,我被困在气流初始化部分。看起来容器之间没有连接,但我不知道如何解决它。

我使用与文档中描述的相同的泊坞站撰写.yaml。可以在这里下载:https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml

目前,我在我的外壳中看到了这一点:

~/dwn $ docker-compose up airflow-init
51ad8448b197_dwn_redis_1 is up-to-date
70409dec742c_dwn_postgres_1 is up-to-date
Starting dwn_airflow-init_1 ... done
Attaching to dwn_airflow-init_1
airflow-init_1       | BACKEND=postgresql+psycopg2
airflow-init_1       | DB_HOST=postgres
airflow-init_1       | DB_PORT=5432

医生说我应该看到这样的东西:

airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.1.2
start_airflow-init_1 exited with code 0

但是该命令只是挂起并且永远不会退出。Htop向我展示了netcat正在这个容器中运行,它正在尝试连接到帖子:

nc -zvvn 172.19.0.3 5432

curl显示超时:

~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432
curl: (7) Failed to connect to postgres port 5432: Connection timed out

为什么会挂?

我尝试了一些方法来解决这个问题:

>

  • 我尝试将postgres服务中的port选项设置为5432:5432-无效

    我尝试设置链接选项-没有效果

    其他问题表明系统熵太低 - 不,有很多熵

    有足够的空闲内存、CPU和磁盘空间

    我尝试按此答案设置网络-更糟的是,容器名称无法解析:

    ~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432
    curl: (6) Could not resolve host: postgres
    

    我尝试重置iptables,就像这个答案中建议的那样 - 没有效果

    一些系统信息:

    • 操作系统:Arch Linux
    • docker版本:20.10.7,内部版本f0df35096d
    • docker-compose版本:1.29.2

    日志!(根据@larsks的要求)

    ~/dwn $ docker-compose ps
           Name                     Command                  State                        Ports                  
    -------------------------------------------------------------------------------------------------------------
    dwn_airflow-init_1   /usr/bin/dumb-init -- /ent ...   Up             8080/tcp                                
    dwn_postgres_1       docker-entrypoint.sh postgres    Up (healthy)   5432/tcp                                
    dwn_redis_1          docker-entrypoint.sh redis ...   Up (healthy)   0.0.0.0:6379->6379/tcp,:::6379->6379/tcp
    ~/dwn $ docker-compose logs postgres
    Attaching to dwn_postgres_1
    postgres_1           | The files belonging to this database system will be owned by user "postgres".
    postgres_1           | This user must also own the server process.
    postgres_1           | 
    postgres_1           | The database cluster will be initialized with locale "en_US.utf8".
    postgres_1           | The default database encoding has accordingly been set to "UTF8".
    postgres_1           | The default text search configuration will be set to "english".
    postgres_1           | 
    postgres_1           | Data page checksums are disabled.
    postgres_1           | 
    postgres_1           | fixing permissions on existing directory /var/lib/postgresql/data ... ok
    postgres_1           | creating subdirectories ... ok
    postgres_1           | selecting dynamic shared memory implementation ... posix
    postgres_1           | selecting default max_connections ... 100
    postgres_1           | selecting default shared_buffers ... 128MB
    postgres_1           | selecting default time zone ... Etc/UTC
    postgres_1           | creating configuration files ... ok
    postgres_1           | running bootstrap script ... ok
    postgres_1           | performing post-bootstrap initialization ... ok
    postgres_1           | initdb: warning: enabling "trust" authentication for local connections
    postgres_1           | You can change this by editing pg_hba.conf or using the option -A, or
    postgres_1           | --auth-local and --auth-host, the next time you run initdb.
    postgres_1           | syncing data to disk ... ok
    postgres_1           | 
    postgres_1           | 
    postgres_1           | Success. You can now start the database server using:
    postgres_1           | 
    postgres_1           |     pg_ctl -D /var/lib/postgresql/data -l logfile start
    postgres_1           | 
    postgres_1           | waiting for server to start....2021-07-17 07:31:38.491 UTC [47] LOG:  starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
    postgres_1           | 2021-07-17 07:31:38.493 UTC [47] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    postgres_1           | 2021-07-17 07:31:38.499 UTC [48] LOG:  database system was shut down at 2021-07-17 07:31:35 UTC
    postgres_1           | 2021-07-17 07:31:38.521 UTC [47] LOG:  database system is ready to accept connections
    postgres_1           |  done
    postgres_1           | server started
    postgres_1           | CREATE DATABASE
    postgres_1           | 
    postgres_1           | 
    postgres_1           | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
    postgres_1           | 
    postgres_1           | 2021-07-17 07:31:39.613 UTC [47] LOG:  received fast shutdown request
    postgres_1           | waiting for server to shut down....2021-07-17 07:31:39.615 UTC [47] LOG:  aborting any active transactions
    postgres_1           | 2021-07-17 07:31:39.616 UTC [47] LOG:  background worker "logical replication launcher" (PID 54) exited with exit code 1
    postgres_1           | 2021-07-17 07:31:39.616 UTC [49] LOG:  shutting down
    postgres_1           | 2021-07-17 07:31:39.644 UTC [47] LOG:  database system is shut down
    postgres_1           |  done
    postgres_1           | server stopped
    postgres_1           | 
    postgres_1           | PostgreSQL init process complete; ready for start up.
    postgres_1           | 
    postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
    postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  listening on IPv6 address "::", port 5432
    postgres_1           | 2021-07-17 07:31:39.748 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    postgres_1           | 2021-07-17 07:31:39.756 UTC [75] LOG:  database system was shut down at 2021-07-17 07:31:39 UTC
    postgres_1           | 2021-07-17 07:31:39.781 UTC [1] LOG:  database system is ready to accept connections
    postgres_1           | 2021-07-17 07:33:49.955 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding
    postgres_1           | 2021-07-17 07:34:00.040 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding
    postgres_1           | 2021-07-17 07:34:00.049 UTC [235] LOG:  using stale statistics instead of current ones because stats collector is not responding
    postgres_1           | 2021-07-17 07:34:10.141 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding
    

    当我编辑postgres服务以使其可以从主机(端口选项)访问时,我可以看到它确实存在

    ~/dwn $ pg_isready -h localhost -p 5432
    localhost:5432 - accepting connections
    

    以下是由 docker-compose 创建的网络的外观:

    [
        {
            "Name": "dwn_default",
            "Id": "8c4e4ab1629cd7d2cb5d532e28b0837a11bc3516ba094248294e5d734a69dc11",
            "Created": "2021-07-17T10:15:50.694208715+02:00",
            "Scope": "local",
            "Driver": "bridge",
            "EnableIPv6": false,
            "IPAM": {
                "Driver": "default",
                "Options": null,
                "Config": [
                    {
                        "Subnet": "172.19.0.0/16",
                        "Gateway": "172.19.0.1"
                    }
                ]
            },
            "Internal": false,
            "Attachable": true,
            "Ingress": false,
            "ConfigFrom": {
                "Network": ""
            },
            "ConfigOnly": false,
            "Containers": {
                "2c6dd1bcd0d81740ab17ff7816acd983ff053be2a8f886ef281b3e5ec1ec642b": {
                    "Name": "dwn_airflow-init_1",
                    "EndpointID": "945c9bd23ffb52bdee7ae9fdf32f48be623ac73cd60a5b248f919fce6aede366",
                    "MacAddress": "02:42:ac:13:00:04",
                    "IPv4Address": "172.19.0.4/16",
                    "IPv6Address": ""
                },
                "3a79a194d97e491c75e573fa78492c9d4f73efd4d868e709c20eb23c9a0ff2a6": {
                    "Name": "dwn_postgres_1",
                    "EndpointID": "b3245b8ab82edc78b205485cd39c368881d7c7b2bc29f325fd3f6f6d8605d9c1",
                    "MacAddress": "02:42:ac:13:00:03",
                    "IPv4Address": "172.19.0.3/16",
                    "IPv6Address": ""
                },
                "dd023f1d42be72d967c5045b7be29deca88caf99377e7d144c51f2212059cefa": {
                    "Name": "dwn_redis_1",
                    "EndpointID": "f85a6cd841028efb7fab17e40f814b0d9de300e90f9506df373d973695a38d97",
                    "MacAddress": "02:42:ac:13:00:02",
                    "IPv4Address": "172.19.0.2/16",
                    "IPv6Address": ""
                }
            },
            "Options": {},
            "Labels": {
                "com.docker.compose.network": "default",
                "com.docker.compose.project": "dwn",
                "com.docker.compose.version": "1.29.2"
            }
        }
    ]
    

    @jarek-potiuk建议我检查ipv6配置。还是不行,但是这次我遇到了一些错误。我是这样做的:

    我创建了 /etc/docker/守护进程.json,其中包含以下内容:

    {
      "ipv6": true,
      "fixed-cidr-v6": "2001:db8:1::/64"
    }
    

    这导致了以下错误(守护程序重新启动后):

    could not find an available, non-overlapping IPv6 address pool among the defaults to as sign to the network
    

    可以通过设置network_mode:为撰写文件中的每个服务设置桥接来修复此错误,现在我的服务具有ipv6地址:

    [
        {
            "Name": "bridge",
            "Id": "092767c3c4137429a7caaa85a1b87c7cb977c4f02055624fa84c4d586ed9758f",
            "Created": "2021-07-17T14:42:08.353393246+02:00",
            "Scope": "local",
            "Driver": "bridge",
            "EnableIPv6": true,
            "IPAM": {
                "Driver": "default",
                "Options": null,
                "Config": [
                    {
                        "Subnet": "172.17.0.0/16",
                        "Gateway": "172.17.0.1"
                    },
                    {
                        "Subnet": "2001:db8:1::/64",
                        "Gateway": "2001:db8:1::1"
                    }
                ]
            },
            "Internal": false,
            "Attachable": false,
            "Ingress": false,
            "ConfigFrom": {
                "Network": ""
            },
            "ConfigOnly": false,
            "Containers": {
                "964c9edadb8f7eb757cd7f1296c2af154ab407ef4d9872f8e613f61d64d6a443": {
                    "Name": "dwn_postgres_1",
                    "EndpointID": "a19bd83ff487611e78074eddafbca18e545edcf9ddc9d7851d3b6d68b7962419",
                    "MacAddress": "02:42:ac:11:00:02",
                    "IPv4Address": "172.17.0.2/16",
                    "IPv6Address": "2001:db8:1::242:ac11:2/64"
                },
                "b45ca546c1539f5f0f1d76423bd4f071efed2e3d6e118b8811e3fd28164fab5a": {
                    "Name": "dwn_airflow-init_1",
                    "EndpointID": "3a2fc42dfda6a534b6840971f4b11af9c78aac2253a036f46721ed6e5659f7b9",
                    "MacAddress": "02:42:ac:11:00:04",
                    "IPv4Address": "172.17.0.4/16",
                    "IPv6Address": "2001:db8:1::242:ac11:4/64"
                },
                "f140d9c90c24fca254e34aec549b559ec5f82bc8b14537e7249192e604110d53": {
                    "Name": "dwn_redis_1",
                    "EndpointID": "1c26f7afa8ada58626b67e7446347e1c4d540513df72784addcf334f99fd53d1",
                    "MacAddress": "02:42:ac:11:00:03",
                    "IPv4Address": "172.17.0.3/16",
                    "IPv6Address": "2001:db8:1::242:ac11:3/64"
                }
            },
            "Options": {
                "com.docker.network.bridge.default_bridge": "true",
                "com.docker.network.bridge.enable_icc": "true",
                "com.docker.network.bridge.enable_ip_masquerade": "true",
                "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
                "com.docker.network.bridge.name": "docker0",
                "com.docker.network.driver.mtu": "1500"
            },
            "Labels": {}
        }
    ]
    

    但还有另一个问题-名称解析停止工作:

    ~/dwn $ docker-compose up airflow-init
    dwn_postgres_1 is up-to-date
    dwn_redis_1 is up-to-date
    Starting dwn_airflow-init_1 ... done
    Attaching to dwn_airflow-init_1
    airflow-init_1       | BACKEND=postgresql+psycopg2
    airflow-init_1       | DB_HOST=postgres
    airflow-init_1       | DB_PORT=5432
    airflow-init_1       | ....................
    airflow-init_1       | ERROR! Maximum number of retries (20) reached.
    airflow-init_1       | 
    airflow-init_1       | Last check result:
    airflow-init_1       | $ run_nc 'postgres' '5432'
    airflow-init_1       | Traceback (most recent call last):
    airflow-init_1       |   File "<string>", line 1, in <module>
    airflow-init_1       | socket.gaierror: [Errno -3] Temporary failure in name resolution
    airflow-init_1       | Can't parse  as an IP address
    airflow-init_1       | 
    dwn_airflow-init_1 exited with code 1
    

    这实际上是有记录的:默认网桥网络上的容器只能通过IP相互访问,但通过IP访问仍然不起作用:

    ~/dwn $ docker exec -i -t dwn_airflow-init_1 sh -c 'echo "PING" | nc -v 172.17.0.3 6379'
    172.17.0.3: inverse host lookup failed: Host name lookup failure
    ^C
    ~/dwn $ echo "PING" | ncat -v localhost 6379
    Ncat: Version 7.91 ( https://nmap.org/ncat )
    Ncat: Connected to ::1:6379.
    +PONG
    Ncat: 5 bytes sent, 7 bytes received in 0.01 seconds.
    

    我还发现在守护进程级别禁用ipv6不会在容器中禁用ipv6,所以我尝试通过设置sysctls在postgres容器中禁用它。它像预期的那样工作:

    ~/dwn $ docker exec -i -t dwn_postgres_1 cat /proc/sys/net/ipv6/conf/all/disable_ipv6
    1
    

    但仍然没有网络接入。

    我现在没主意了。

  • 共有2个答案

    曾光誉
    2023-03-14

    嗯,我自己想出来了。

    TL公司;DR:PEBKAC-用户错误配置了防火墙,忘记了他告诉内核丢弃转发的数据包

    让我们从头开始:docker-compose up airflow-init只打印这个并等待一些东西:

    ~/dwn $ docker-compose up airflow-init
    51ad8448b197_dwn_redis_1 is up-to-date
    70409dec742c_dwn_postgres_1 is up-to-date
    Starting dwn_airflow-init_1 ... done
    Attaching to dwn_airflow-init_1
    airflow-init_1       | BACKEND=postgresql+psycopg2
    airflow-init_1       | DB_HOST=postgres
    airflow-init_1       | DB_PORT=5432
    

    也许主机< code>postgres指向了奇怪的地方:

    ~/dwn $ docker exec -i -t dwn_airflow-init_1 host postgres
    postgres has address 172.20.0.2
    

    不是真的,看起来像任何其他Docker ip,但这个netcat调用仍然挂起:

     nc -zvvn 172.120.0.2 5432
    

    这意味着postgres服务根本没有响应airflow init容器。但是,postgres响应了来自主机系统的请求。这意味着postgresairflow之间没有路由,即使它们在同一个网络中。也许内核会丢弃转发的数据包?

    ~ # sysctl net/ipv4/conf/all/forwarding
    net.ipv4.conf.all.forwarding = 1
    ~ # sysctl net/ipv6/conf/all/forwarding
    net.ipv6.conf.all.forwarding = 1
    

    已启用转发。也许防火墙会丢弃它们?

    ~ # iptables -S FORWARD
    -P FORWARD ACCEPT
    -A FORWARD -j DOCKER-USER
    -A FORWARD -j DOCKER-ISOLATION-STAGE-1
    -A FORWARD -o br-b4a6c0b51ae7 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    -A FORWARD -o br-b4a6c0b51ae7 -j DOCKER
    -A FORWARD -i br-b4a6c0b51ae7 ! -o br-b4a6c0b51ae7 -j ACCEPT
    -A FORWARD -i br-b4a6c0b51ae7 -o br-b4a6c0b51ae7 -j ACCEPT
    -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    -A FORWARD -o docker0 -j DOCKER
    -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
    -A FORWARD -i docker0 -o docker0 -j ACCEPT
    

    看起来他们可以通过。也许泊坞窗以某种方式坏了?重新安装,重新启动,也会发生同样的情况。

    也许iptables不知怎么坏了?

    ~ # pacman -S iptables
    resolving dependencies...
    looking for conflicting packages...
    :: iptables and iptables-nft are in conflict. Remove iptables-nft? [y/N]
    

    哦。我安装了nftable。这很奇怪。我的防火墙实际上是如何管理的?

    ~ # systemctl status iptables nftables
    ○ iptables.service - IPv4 Packet Filtering Framework
         Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
         Active: inactive (dead)
    
    ● nftables.service - Netfilter Tables
         Loaded: loaded (/usr/lib/systemd/system/nftables.service; enabled; vendor preset: disabled)
         Active: active (exited) since Mon 2021-07-19 10:30:41 CEST; 6h ago
           Docs: man:nft(8)
        Process: 824 ExecStart=/usr/bin/nft -f /etc/nftables.conf (code=exited, status=0/SUCCESS)
       Main PID: 824 (code=exited, status=0/SUCCESS)
            CPU: 9ms
    

    还有……前链是什么样的?

    ~ # nft list chain inet filter forward                               
    table inet filter {
        chain forward {
            type filter hook forward priority filter; policy accept;
            drop
        }
    }
    

    嗯......

    现在,如果我删除该链:

    ~ # nft delete chain inet filter forward
    

    突然,气流在第二终端开始大量打印。目标实现:

    airflow-init_1       | Admin user airflow created
    airflow-init_1       | 2.1.2
    dwn_airflow-init_1 exited with code 0
    
    丌官远
    2023-03-14

    非常详细的分析。很高兴看到有人采取这么多步骤来挖掘。

    设置和日志中的所有内容看起来都很好。因此,我不认为 Docker-compose 的问题,它一定是你的环境有问题。

    然而,我注意到一件事,虽然我不是100%确定,但这可能是原因。

    我注意到您的postgres服务器同时监听IPV4和IPV6网络,但是您的docker组合网络只显示IPV4地址。

    我的假设是,虽然您为docker引擎启用了IPV6,但它对IPV6是禁用的(或配置错误)。

    然后会发生的事情是,当您尝试使用IPV6分辨率解析postgres地址时,它会在通过配置错误的DNS检索地址时挂起 - 因此超时。

    您可以在< code >/etc/docker/daemon . JSON 中将ipv6设置为false(https://docs.docker.com/config/daemon/ipv6/ ),然后重新启动守护程序:

    {
      "ipv6": true,
      "fixed-cidr-v6": "2001:db8:1::/64"
    }
    
     类似资料:
    • 问题内容: 我目前使用Docker Swarm和Consul设置了3个EC2实例。我有3个简单的节点应用程序分布在所有3个实例上,然后使用nginx在我的集群主机上进行路由。 使用覆盖网络指南,我创建了一个名为的新覆盖,并使每个容器都连接到该网络。使用I可以确认每个节点容器和nginx容器都已连接并具有IP。但是,在进入我的nginx容器后,它只能通过网络与位于同一主机上的节点应用程序通信,而不能

    • 如果我知道一个Docker容器的IP地址,我就可以很容易地从另一个容器与它通信,但前提是它们在同一个网络中。 我的问题是,我如何与来自另一个网络的容器通信,为什么我不能访问同一台机器上的本地IP?我对网络解释感兴趣,为什么我可以从172.19.0.2访问172.19-0.1,但无法从172.20.0.1访问172.20-0.2。 让一个网络中的docker容器与另一个网络中的Docker容器进行通

    • 问题内容: 我正在尝试制作一些docker容器来存放我的一些日常工具。但是我的许多工具都依赖于能够连接到设备(通过wifi)以提取数据。 我一直在做研究,但很困惑,想了解要支持这种情况需要做什么。我知道通常docker容器是服务器而不是客户端。但是我读过关于人们做相反的事情。 我正在尝试找出进行此操作所需的更改/配置类型。 问题答案: 默认情况下,Docker将在您的物理服务器上创建一个虚拟网络。

    • 问题内容: 我有一个在AWS EC2实例上运行的kubernetes集群,并编织为networking(cni)。我已经禁用了docker网络(ipmask和iptables),因为它是通过weave管理的(以避免网络冲突)。 我已经将我的Jenkins作为K8s pod部署在此集群上,并且该jenkins使用jenkins kubernetes插件基于我定义的pod和容器模板生成了动态的slav

    • 我有一个依赖于多个docker容器的应用程序。我使用docker撰写,以便所有容器都在同一个网络中进行容器间通信。但是,我的两个容器在各自的容器中监听相同的端口8080,但是映射到主机上的不同端口: 8072,8073。对于集装箱间的通信,因为我们使用集装箱的端口,这会引起问题吗? 限制条件: 我需要两个容器才能运行我的应用程序。因此,我无法将具有相同内部端口的其他容器隔离到不同的网络 所有容器都