IPVS debugging
March 26, 2023•517 words
Debugging IPVS for Kubernetes
Symptom
A ssh daemon runs in a container called gogs
, listening on port 22. A service, also called gogs
forwards port 31443 on the node to the internal port 22. However I cannot seem to login using ssh -p 31443 localhost
from any kubernetes node.
$ k describe service/gogs | grep -A3 "^Port"
Port: http 50001/TCP
TargetPort: 3000/TCP
NodePort: http 31444/TCP
Endpoints: 10.244.2.239:3000
Port: ssh 50002/TCP
TargetPort: 22/TCP
NodePort: ssh 31443/TCP
Endpoints: 10.244.2.239:22
All through this debug session, I'll use the following command to check whether the problem is solved, run inside the node on which gogs
is running (called del.dodges.it
).
$ ssh -p 31443 localhost
ssh: connect to host localhost port 31443: Connection refused
In fact, to speed up, I'll use watch to re-try every 5 seconds:
$ watch -n 5 -- ssh -p 31443 localhost
Investigation
kube-proxy is the cornerstone of Kubernetes Services. It exists in two flavors,
- iptables: creates a bunch of iptables rules to forward packets.
- ipvs: which forwards packets through the IPVS module
Track 1 — iptables leftovers
I had recently modified the configuration of kube-proxy to enable IPVS, in order to install Metallb, and I suspected that the problem was that during the switch, old iptables rules were still present.
To check my hypothesis, I flushed all iptables rules on the machine:
# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
Unfortunately, this didn't solve the issue.
Track 2 — IPVS misconfiguration
I installed ipvsadm
to look up the behavior of IPVS more in details.
IPVS works by defining virtual services and servers. The way I understand it is that when a request reaches the virtual service nginx
, IPVS will redirect to it to one of the associated server. It can redirect in different fashion, round-robin or others.
# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=32768)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
...
TCP 192.168.1.31:31443 rr
-> 10.244.2.239:22 Masq 1 0 0
TCP 192.168.2.31:31443 rr
-> 10.244.2.239:22 Masq 1 1 1
In my setup:
192.168.1.31
is the ethernet interface connected to the internet192.168.2.31
is the wireguard network sitting on top of it.
So it looks like kubernetes has (correctly) mapped the port 31443 of the node to the service. Indeed, checking SSH on localhost
was not good, trying on the ethernet IP.
$ ssh -p 31443 192.168.1.7
The authenticity of host '[192.168.1.7]:31443 ([192.168.1.7]:31443)' can't be established.
ED25519 key fingerprint is SHA256:RuxbIroDCfqMilTGuaGdqxqWqdqcyqOgpcCE1KT3lrA.
This host key is known by the following other names/addresses:
~/.ssh/known_hosts:5: [gogs.dodges.it]:31443
Are you sure you want to continue connecting (yes/no/[fingerprint])?
Repeating from another node shows the same behavior, so the problem I'm experiencing seems to be related to packets entering the node from my router.
Could it be lingering iptables rules after all?
# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
And ultimate test:
$ git push
Everything up-to-date