How to split short term and long term VictoriaMetrics storage

VictoriaMetrics is a fantastic alternative to prometheus, especially in a home lab where resources are constrained. It’s several times more efficient with its RAM usage while being pretty much fully PromQL compatible (with a few nice extras, too).

One of the nice features of VM are retention filters, allowing to set up different retentions for different metrics (this feature is available only in the enterprise version, though). This allows longer retention for important historical data like the average HDD temperature over the last 5 years vs. dropping the very exciting and very useless (most of the time) kubelet metrics within days.

Unfortunately, the retention filters don’t clean up the indexdb until the primary retention period kicks in, and that one would be the largest retention period in your VM instance (i.e. the filters can only decrease it), meaning that for a retentionPeriod of 5 years, your indexdb will never be compacted (and it tends to grow pretty unconstrained).

A relatively easy solution to that is to configure the vmagent to send metrics to two VM storage clusters (the architecture scales horizontally really well) based on the labels, and then have one vmselect selecting data from both clusters so that the consumers (grafana) have no idea what’s happening.

Lets set up the clusters first. It’s pretty trivial with the operator, I just had to copy and paste the relevant CRDs and move the vmselect component into its own dedicated cluster:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166


### THIS IS THE SHORT-TERM CLUSTER
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: short
spec:
  retentionPeriod: "1" # month
  ### IT HAS ONE STORAGE
  vmstorage:
    image:
      tag: v1.101.0-enterprise-cluster
    replicaCount: 1
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 50Gi
          storageClassName: manual
          volumeName: victoriametrics-short-data
    extraArgs:
      eula: "1"
      enableTCP6: "true"
      licenseFile: "/license/license.txt"
    volumes:
      - name: license
        secret:
          secretName: license
    volumeMounts:
      - name: license
        readOnly: true
        mountPath: "/license"
  ### AND ONE INSERT
  vminsert:
    image:
      tag: v1.101.0-enterprise-cluster
    replicaCount: 1
    extraArgs:
      maxLabelsPerTimeseries: "35"
      eula: "1"
      enableTCP6: "true"
      licenseFile: "/license/license.txt"
    volumes:
      - name: license
        secret:
          secretName: license
    volumeMounts:
      - name: license
        readOnly: true
        mountPath: "/license"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: victoriametrics-short-data
spec:
  storageClassName: manual
  capacity:
    storage: 50Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/kube/victoriametrics-short-data"
    
### THIS IS THE LONG TERM CLUSTER
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: long
spec:
  ### MUCH HIGHER RETENTION
  retentionPeriod: "5y"
  vmstorage:
    image:
      tag: v1.101.0-enterprise-cluster
    replicaCount: 1
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 50Gi
          storageClassName: manual
          volumeName: victoriametrics-long-data
    extraArgs:
      eula: "1"
      enableTCP6: "true"
      ### AND DOWNSAMPLING APPLIED
      downsampling.period: 30d:5m,180d:1h,1y:6h,2y:1d
      licenseFile: "/license/license.txt"
    volumes:
      - name: license
        secret:
          secretName: license
    volumeMounts:
      - name: license
        readOnly: true
        mountPath: "/license"
  vminsert:
    image:
      tag: v1.101.0-enterprise-cluster
    replicaCount: 1
    # TODO: Authenticate the API
    extraArgs:
      maxLabelsPerTimeseries: "35"
      eula: "1"
      enableTCP6: "true"
      licenseFile: "/license/license.txt"
    volumes:
      - name: license
        secret:
          secretName: license
    volumeMounts:
      - name: license
        readOnly: true
        mountPath: "/license"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: victoriametrics-long-data
spec:
  storageClassName: manual
  capacity:
    storage: 50Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/kube/victoriametrics-long-data"

### THIS IS A SELECT-ONLY CLUSTER
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: select
spec:
  retentionPeriod: "1" # unused
  vmselect:
    image:
      tag: v1.101.0-enterprise-cluster
    replicaCount: 1
    extraArgs:
      eula: "1"
      enableTCP6: "true"
      ### THAT IS POINTED TO BOTH STORAGES
      storageNode: "vmstorage-short:8401,vmstorage-long:8401"
      downsampling.period: 30d:5m,180d:1h,1y:6h,2y:1d
      licenseFile: "/license/license.txt"
    volumes:
      - name: search-results
        emptyDir:
          sizeLimit: 500Mi
      - name: license
        secret:
          secretName: license
    volumeMounts:
      - name: search-results
        mountPath: /tmp
      - name: license
        readOnly: true
        mountPath: "/license"

Now, let’s tell vmagent to send labels to different clusters based on the name. To do that, you need to make an extra relabel config per remote write URL. In my case, I have this as my long.yaml:

1
2
3
4
5
6
7


- action: keep
  regex:
  - node_hwmon_temp_celsius
  - node_hwmon_sensor_label
  - smartctl_device_temperature
  source_labels:
  - __name__

and its inversion for short.yaml:

1
2
3
4
5
6
7


- action: drop
  regex:
  - node_hwmon_temp_celsius
  - node_hwmon_sensor_label
  - smartctl_device_temperature
  source_labels:
  - __name__

By default, vmagent will happily write metrics to all the remote endpoints, but in my case there’s no need to store them twice. Also, note the nice regex: [...] syntax that’s not available in prometheus!

To apply the configs, you need to match every -remoteWrite.url flag with a -remoteWrite.urlRelabelConfig flag. The config will be applied to the preceding url. If there are less configs than urls, the final ones won’t have any extra relabelling done. My nix config looks like this in the end:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


services.vmagent =
  let
    format = pkgs.formats.yaml { };
    longMetrics = [
      "node_hwmon_temp_celsius"
      "node_hwmon_sensor_label"
      "smartctl_device_temperature"
    ];
    relabelShort = [
      {
        action = "drop";
        source_labels = [ "__name__" ];
        regex = longMetrics;
      }
    ];
    relabelLong = [
      {
        action = "keep";
        source_labels = [ "__name__" ];
        regex = longMetrics;
      }
    ];
    relabelShortConfig = format.generate "short.yml" relabelShort;
    relabelLongConfig = format.generate "long.yml" relabelLong;
  in
  {
    enable = true;
    extraArgs = [
      "-remoteWrite.url=http://${shortURL}/insert/0/prometheus/api/v1/write"
      "-remoteWrite.urlRelabelConfig=${relabelLongConfig}"
      "-remoteWrite.url=http://${longURL}/insert/0/prometheus/api/v1/write"
      "-remoteWrite.urlRelabelConfig=${relabelShortConfig}"
      "-remoteWrite.tmpDataPath=%C/vmagent/remote_write_tmp"
    ];
  };

To verify that it’s working as intended, navigate to the vmselect instance and run a query for the metric that should be passed to the other instance. VMselect has built-in tracing capabilities:

that demonstrate all the blocks are coming from vmstorage-long:

Huge thanks to the VM team for creating a superb product and for allowing me to use the enterprise edition in my homelab setup.