The woes of networkd

When I based the initial implementation of tinyvmm’s networking on networkd I thought it was a genuinely good idea. Networkd knows how to manage networks and it’s much easier to talk INI files than kernel syscalls. It also had a neat DBUS API, meaning I could control it programmatically. And I didn’t even need that much—tinyvmm only needs to manage bridges and tap devices for now.

It even worked. The code was messy, sure, but I got my networking up and running with no major incidents. That is, until I noticed I can’t curl from the VM to a different machine. I could, but not always. One in 5-6 curl runs would get stuck, hanging forever. A quick check with wrk showed 5% loss.

I spent the next few hours trying to reason with the traffic flows, burning my eyes in the tcpdump output, and commenting parts of nftables rules. All for nothing, the packet loss was persistent. And it took me to notice my ssh session to the VM would hang too, for 1-3 seconds. About every 10 seconds. Which is the tinyvmm’s reconcile window… during which it calls networkd to reload().

And, apparently, reloading means re-reading the state from files and trying to reapply it to the network interfaces, resulting in a brief window of lost connectivity!

I’ve removed the reload as a bandaid, but, of course, that means I need to run it manually when I restart the VMs. I guess I’ll be moving away from networkd, after all rust has a dozen crates to create tap devices, and, I’m sure, there’s something that can make bridges, too.