In the New Year episode of “farcaller breaks things and then fixes them” I will tell you about envoy and its lua scripting capabilities. I have a bit of a tricky website setup in here—the old website is wordpress, and the new one is static files. What I want is to serve the static files (off their S3 bucket web endpoint provided by garage), unless the response is 404. In that case I want to use a fallback backend (wordpress), but only if that doesn’t return 404 either (if both are a miss I want the 404 page coming from the static website). I also want to always hit wordpress if there’s a magic query param or a cookie, as a fallback mechanism.

Previously, I did all that with a bit of a nginx magic, but it wasn’t the most pleasant config to maintain with a bunch of error_page 404 = @wp chaining. Besides, it seemed odd to have nginx only as a way to solve this problem alone—I was pretty sure there was some way to do it with pure envoy, either via its lua scripting, or its WASM support. Lua seemed like the easier of two, so that’s what I went with.

Let’s start. My control plane is istio, so I need a mechanism to deliver a chunk of envoy config to the right place. In istio, you do that with EnvoyFilter CRD:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: blog-retry-404
  namespace: istio-ingress
spec:
  workloadSelector:
    labels:
      istio: ingress
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            sni: "example.com"
            filter:
              name: "envoy.filters.network.http_connection_manager"
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.lua
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
            inlineCode: |
              -- lua goes here              

We want to apply a patch to the http_filters chain within the http_connection_manager. As this modifies the gateway configuration (and not the envoy sidecar on the workload), we also must narrow it down by SNI or it will be applied to every single domain this ingress serves. You also want this filter to be in the same namespace as your ingress envoy for it to match proper.

It seems that one of the SNIs is good enough for istio, so if your gateway is configured with hosts: [example.com, *.example.com] it will still match correctly.

Now, for the actual contents of a lua script. Following the lua filter docs, we want to implement both envoy_on_request and envoy_on_response: the former will save the request headers in case we need to do a second request and the latter will do all the processing logic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
function envoy_on_request(request_handle)
    local headers = request_handle:headers()
    local request_headers_table = {}

    for key, value in pairs(headers) do
        request_headers_table[key] = value
    end

    request_handle:streamInfo():dynamicMetadata():set(
        "envoy.filters.http.lua",
        "request_headers",
        request_headers_table)
end

Envoy is extremely notorious for not passing any values coming from the C++ world around, so we construct a new lua table, copy the headers into it and store it in the dynamic metadata of the request. There might be some trickery involved with requests that have the same header repeated, but I blatantly disregard this use case as I don’t expect normal browser sessions to utilize that.

The response handler is where we do the magic:

1
2
3
4
5
6
7
8
function envoy_on_response(response_handle)
    local response_headers = response_handle:headers()
    resp_status_int = tonumber(response_headers:get(":status"))
    
    if resp_status_int == 404 then
        ...
    end
end

The basic premise is simple. We look into the response header :status, and if the status is 404 we do extra processing. Otherwise the filter ends and envoy continues to output the response as is.

The first thing we do if we handle the response is reconstructing the request headers:

1
2
3
4
5
6
7
8
9
...
if resp_status_int == 404 then
    local saved_headers = response_handle:streamInfo():dynamicMetadata():get("envoy.filters.http.lua")["request_headers"]
  
    if not saved_headers then
        response_handle:logErr("No saved headers found in dynamicMetadata")
        return
    end
    ...

Then, we perform a request to our fallback backend:

1
2
3
4
5
6
7
8
9
...
response_handle:body(true)
local fallback_response_headers, fallback_response_body = response_handle:httpCall(
    "outbound|443||example.com",
    saved_headers,
    "",
    5000
)
...

A crucial piece of code here is response_handle:body(true). If you don’t have it, after the httpCall leaves into a coroutine, envoy will happily continue on streaming the original body and you won’t have a chance to override it later. This call tells envoy to buffer the reply from the original backend.

The httpCall itself is also slightly tricky. First, you use a cluster name from your configuration (and not a hostname). In my case, I provide a specific backend via istio’s ServiceEntry:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: blog-forwarder
spec:
  hosts:
    - example.com
  ports:
    - number: 443
      name: https
      protocol: HTTPS
  resolution: STATIC
  location: MESH_EXTERNAL
  endpoints:
    - address: ...

We pass in saved_headers as is, as httpCall expects a lua table. Supposedly, you want to spell out the :authority, :method, and :path explicitly, but I am extremely sure they are present in the request headers (or the http router would have bailed). We also pass an empty string for a body, and 5000 ms as a timeout.

The http call will continue in a separate coroutine, meaning any local variable you created before it are now stale, e.g. if you have local response_headers = response_handle:headers() before the httpCall(...) you won’t be able to use it after the call. Be careful with that and always re-retrieve any C++-world values by reaching into response_handle again.

httpCall returns a lua table (not a header object unlike response_handle:headers()) and the body as a string. No streaming in here, duh.

Now, if our fallback backend failed, we bail out:

1
2
3
4
5
...
if tonumber(fallback_response_headers[":status"]) == 404 then
    return
end
...

Meaning, envoy will continue with the original response and original headers. Otherwise, we replace the original headers and body with what we got from the httpCall:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
...
local response_headers = response_handle:headers()
local original_headers = {}
for key, _ in pairs(response_headers) do
    table.insert(original_headers, key)
end
  
for _, key in ipairs(original_headers) do
    if not fallback_response_headers[key] then
        response_headers:remove(key)
    end
end
for key, value in pairs(fallback_response_headers) do
    response_headers:replace(key, value)
end
  
if fallback_response_body == nil then
    fallback_response_body = ""
end
response_handle:body(true):setBytes(fallback_response_body)
...

There’s a bit of trickery going on in here. First, we convert the original response headers into a lua table, because we can’t mutate them while iterating. Then, we go over them and remove any header that’s not in the fallback response. Finally, we replace all the headers from the fallback response (replace will create a new header if it’s missing from the original response). This is a place where everything can do sideways, too. My original code called response_headers:remove(key) for all the keys inside it so I could start from a clean state, but, apparently, not having system level headers (:status) will just SIGSEGV envoy. Isn’t it fun?

Finally, we set the response body. Note how we call into response_handle:body(true) again, because the one from before the httpCall is no longer valid. We also make sure that there is some body, because httpCall can return nil.

This leaves us with a following script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
function envoy_on_request(request_handle)
    local headers = request_handle:headers()
    local request_headers_table = {}

    for key, value in pairs(headers) do
        request_headers_table[key] = value
    end

    request_handle:streamInfo():dynamicMetadata():set(
        "envoy.filters.http.lua",
        "request_headers",
        request_headers_table)
end

function envoy_on_response(response_handle)
    local response_headers = response_handle:headers()
    resp_status_int = tonumber(response_headers:get(":status"))
    
    if resp_status_int == 404 then
        local saved_headers = response_handle:streamInfo():dynamicMetadata():get("envoy.filters.http.lua")["request_headers"]

        if not saved_headers then
            response_handle:logErr("No saved headers found in dynamicMetadata")
            return
        end

        response_handle:body(true)
        local fallback_response_headers, fallback_response_body = response_handle:httpCall(
            "outbound|443||example.com",
            saved_headers,
            "",
            5000
        )

        if tonumber(fallback_response_headers[":status"]) == 404 then
            return
        end
        
        local response_headers = response_handle:headers()
        local original_headers = {}
        for key, _ in pairs(response_headers) do
            table.insert(original_headers, key)
        end

        for _, key in ipairs(original_headers) do
            if not fallback_response_headers[key] then
                response_headers:remove(key)
            end
        end
        for key, value in pairs(fallback_response_headers) do
            response_headers:replace(key, value)
        end

        if fallback_response_body == nil then
            fallback_response_body = ""
        end
        response_handle:body(true):setBytes(fallback_response_body)
    end
end

The rest of the requirements (talking to a fallback backend based on a query arg or a cookie) are much easier handled in the VirtualService match configuration, so I’m not including them in here.