Kata Containers is a lightweight VM container runtime that allows to run your workloads with better isolation. Setting them up on NixOS with k3s is slightly more intricate than it should be, so here’s a quick guide.

First, allow k3s to access a whole bunch of /dev/* nodes that are used for VM bootstrapping. K3s runs the containerd as part of its server. If you have an external containerd, adjust the service name accordingly.

1
2
3
4
5
6
7
8
9
systemd.services.k3s.serviceConfig.DeviceAllow = [
  "/dev/kvm rwm"
  "/dev/mshv rwm"
  "/dev/kmsg rwm"
  "/dev/vhost-vsock rwm"
  "/dev/vhost-net rwm"
  "/dev/net/tun rwm"
];
systemd.services.k3s.serviceConfig.Delegate = "yes";

Second, make sure your kvm module is loaded:

1
boot.kernelModules = [ "kvm-amd" ];

I had some fun time here, learning that my virtualisation support was off in BIOS for whatever reason. At least, I had the “other” KVM (the KVM switch).

Finally, add the kata runtime to the list of k3s-visible packages and a template for containerd (if you use a managed containerd):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
systemd.services.k3s.path = [ pkgs.kata-runtime ];
systemd.tmpfiles.settings."09-k3s"."/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.tmpl"."L+".argument = let
  template = ''
    {{ template "base" . }}

    [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'kata']
        runtime_type = "io.containerd.kata.v2"
        privileged_without_host_devices = true
        pod_annotations = ["io.katacontainers.*"]
        container_annotations = ["io.katacontainers.*"]
    '';
  in "${pkgs.writeText "config-v3.toml.tmpl" template}";

Note that there’s a services.k3s.containerdConfigTemplate option, however it only supports the version 2 of the containerd config, while the above snippet is version 3. That’s why I have to do it by hand.

Restarting k3s should update your /var/lib/rancher/k3s/agent/etc/containerd/config.toml. If it didn’t, delete the existing file and restart k3s again.

Now, kubernetes side, register the runtime class:

1
2
3
4
5
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata
handler: kata

And try spinning up a test pod:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Pod
metadata:
  name: kata-test
spec:
  runtimeClassName: kata
  containers:
  - name: busybox
    image: busybox
    command: ["sh", "-c", "sleep 3600"]

Things to watch out for

Sometimes the pod will be stuck scheduling. Check the k3s logs in journald (journalctl -elfu k3s) to see if there are any permissions issues.

Your CNI might be unhappy if the container spins up a bit too slow. That should be an eventually-consistent process which sorts itself out.

If you use Cilium, make sure you don’t have the socket load-balancer optimization enabled (socketLB.hostNamespaceOnly = true). Kata can only work with a tap device.