Understanding modern Linux routing (and wg-quick)
Published on February 27, 2021; updated on March 1, 2021
Back in the old days, I could just type route
(or, later, ip route
) in my Linux terminal and get an accurate picture of all my routes. This is no longer the case.
For instance, the machine where I’m writing this is connected to the Mullvad VPN via the Wireguard protocol using the wg-quick script. I’m pretty sure all my traffic goes through Mullvad, yet you wouldn’t be able to tell this from my ip route
output:
% ip route
default via 192.168.1.1 dev enp34s0 proto static metric 100
192.168.1.0/24 dev enp34s0 proto kernel scope link src 192.168.1.121 metric 100
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
Note that the default route seemingly directs all traffic through my physical network interface, not the virtual VPN interface.
So let’s figure out how this all works.
ip route
)In reality, there isn’t the routing table in Linux (and hasn’t been for more than 20 years, since around Linux-2.2). Instead, there are multiple routing tables — and a set of rules that tell the kernel how to choose the right table for each packet.
(By the way, do not confuse routing tables with iptables. To simplify a bit, routing tables specify how to deliver a packet, whereas iptables specify whether to deliver it at all. They are completely different and unrelated.)
What you see when you run ip route
without specifying a table is the contents of one particular table, main
. Tables are identified by integer numbers (from 1 to 232−1) but can be also given textual names, which are listed in the file /etc/iproute2/rt_tables
. The default one will look something like this:
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
(Are you wondering what inr.ruhep is? This is just an example, likely added by Alexey Kuznetsov, who worked on these parts of the Linux kernel and iproute tools. It stands for “Institute for Nuclear Research / Russian High Energy Physics”, the place where Alexey worked at the time, and probably refers to their internal network. There was also an old-school Russian computer network/ISP called RUHEP/Radio-MSU.)
You can view the contents of any table like this:
% ip route list table local
% ip route list table 13
ip rule
)So how does the kernel know which routing table to apply? It uses the “routing policy database”, which is managed by the ip rule
command. In particular, ip rule
without any arguments will print all existing rules. These are mine:
% ip rule
0: from all lookup local
32764: from all lookup main suppress_prefixlength 0
32765: not from all fwmark 0xca6c lookup 51820
32766: from all lookup main
32767: from all lookup default
The numbers you see on the left (0, 32764, …) are rule priorities: the lower the priority, the higher the priority. That is to say, rules with lower numbers are processed first.
Apart from the priority, each rule has also a selector and an action. The selector tells us whether the rule applies to the packet at hand. If it does, the action is executed. The most common action is to consult a particular routing table (see the previous section). If that routing table contained a route for our packet, then we’re done; otherwise, we proceed to the next rule.
The rules with priorities 0, 32766 and 32767 above are created automatically by the kernel. To quote the ip-rule(8)
man page:
Priority: 0, Selector: match anything, Action: lookup routing table local (ID 255). The local table is a special routing table containing high priority control routes for local and broadcast addresses.
Priority: 32766, Selector: match anything, Action: lookup routing table main (ID 254). The main table is the normal routing table containing all non-policy routes. This rule may be deleted and/or overrid‐ den with other ones by the administrator.
Priority: 32767, Selector: match anything, Action: lookup routing table default (ID 253). The default table is empty. It is reserved for some post-processing if no previous default rules selected the packet. This rule may also be deleted.
The other two rules have been created by the wg-quick script. If you want to understand how they work, read on.
Let’s look at the two rules that are added by wg-quick:
32764: from all lookup main suppress_prefixlength 0
32765: not from all fwmark 0xca6c lookup 51820
At first sight, these are quite cryptic: what does suppress_prefixlength
do, what is 0xca6c, and how can a packet be “not from all”?
Let’s start from the 32764 rule: as it has a lower number, it’s considered first.
32764: from all lookup main suppress_prefixlength 0
The rule has no selector, making the kernel consult the main
table for every single packet.
If this was the whole rule, every packet would be routed by the main table, never reaching the VPN. This is why the action also contains a suppressor: suppress_prefixlength 0
. From the ip-rule(8)
man page
suppress_prefixlength NUMBER
reject routing decisions that have a prefix length of NUMBER or less.
Here “prefix” refers to the address or range of addresses matched in the routing table. So if you have a route for 10.2.3.4, its prefix length is 32 (bits); if you change it to 10.0.0.0/8, the prefix length will be 8.
What is a prefix of length 0 or less? It’s the empty prefix, 0.0.0.0/0, corresponding to the default route. So if the packet was routed by the default route from main
, that routing decision is ignored; otherwise, it’s respected.
To summarize, the effect of this rule is to respect all manual routes that the administrator might have added to the main
table. However, if the packet didn’t match any of the specific routes, then instead of applying the default route, we’re proceeding to the next rule.
32765: not from all fwmark 0xca6c lookup 51820
The “not from all” bit is just a quirk of how ip rule
formats its rules. A better way to express it would be
32765: from all not fwmark 0xca6c lookup 51820
It’s just that when no “from” prefix (address or range) is present in the rule’s selector, ip rule
prints “from all”.
51820 is a routing table, also created by wg-quick, containing a single role:
% ip route list table 51820
default dev mullvad scope link
So the effect of the rule is to route everything that reached it through the VPN, with one exception: the mysterious not fwmark 0xca6c
.
0xca6c is just a numerical label (“firewall mark”) that wg-quick asked wg to mark all of the packets that it emits. These are packets that already encapsulate other packets and are targeted to your VPN peer/server. If these packages were routed back to wireguard, that would create an infinite loop of wrappers on top of wrappers.
So the selector ensures that packets that have already been encapsulated can escape through your normal internet connection. Since these packets are ignored by this rule, they proceed to the rule
32766: from all lookup main
But now there is no suppressor, so these packages are free to use the default route.
Fun fact: wg-quick uses the same numbers for the table and the fwmark: 0xca6c is just 51820 in hexadecimal.
Overall, this setup works quite well. Older VPN scripts used to override your default route in the main table when connecting to the VPN and restore it when disconnecting. Sometimes this wouldn’t work, and after disconnecting from the VPN you would be left without any default route at all. wg-quick doesn’t have this problem, as it never messes with your main
routing table. All it has to do when disconnecting is delete its two rules, and your default route is active again, Or you could even do that yourself with ip rule del
.
Félix Baylac Jacqué says:
If you want to go further on advanced linux-based routing:
man ip-vrf
<= routing tables attached to a particular netdev.man ip-netns
<= network namespaces. There’s a nice trick using them to prevent any packet leakeage w/ wireguard https://www.wireguard.com/netns/