(advisory link)

This story starts like many similar ones: I wanted to figure the best way of integration my automation with ArgoCD and ended up discovering that my code works despite the logic saying it wouldn’t. OAuth is fun like that.

I have ArgoCD configured with Auth0 as the OIDC provider. It works just fine for human interactions, but I wanted to add some machinery code that’d do automation on top of what ArgoCD does, and what better option is there than Auth0’s machine-to-machine project? Surely, that requires a new audience, and ArgoCD didn’t seem to support several of those at a quick skimming of the docs, so I went out to study the code.

Why’d my clearly broken client work?

Looking through the auth headers access, eventually I ended up at the VerifyToken method. ArgoCD, apparently, supported an internal issuer (for the cli use), so the token validator distinguished between the two possible issuers—the internal and the external one.

At this point, I knew that the actual verification is done by gooidc library in the Provider’s Verify method that accepted the clientID as the audience. Where did VerifyToken get that one?

1
2
3
4
5
6
for _, aud := range claims.Audience {
	idToken, err = prov.Verify(aud, tokenString)
	if err == nil {
		break
	}
}

The audience was coming from claims, and claims were coming from…

1
2
var claims jwt.RegisteredClaims
_, _, err := parser.ParseUnverified(tokenString, &claims)

Uh oh. ParseUnverified, has a warning note in its docs:

ParseUnverified parses the token but doesn’t validate the signature. WARNING: Don’t use this method unless you know what you’re doing. It’s only ever useful in cases where you know the signature is valid The only problem is that the tokenString is a raw token coming from the user!

Now, why was it even there in the original code? The answer is simple: ArgoCD has different code paths for internal and external tokens:

1
2
3
4
5
6
7
8
switch claims.Issuer {
case SessionManagerClaimsIssuer:
	// Argo CD signed token
	return mgr.Parse(tokenString)
default:
	// IDP signed token
	...
}

This was clearly done for the sake of better user experience; knowing the token type allowed ArgoCD to provide better error messages in the frontend and even handle some refreshes in an easier way.

This was clearly done for the sake of better user experience; knowing the token type allowed ArgoCD to provide better error messages in the frontend and even handle some refreshes in an easier way.

To sum up the steps that lead to a vulnerability:

  1. ArgoCD parses the token without any verification

  2. And then uses the Audience claim from that parsed result to verify the same token

Obviously, that always works because the audience is taken from whatever is in the token, so when the token is actually verified against it the audience is guaranteed to be correct. Effectively, ArgoCD only checks the signature, in this case.

Which means a resourceful attacker can use a token issued for any service, not ArgoCD specifically, and authenticate to ArgoCD with it. In the age of OIDC providers with multiple applications, it’s not something unheard of.

Escalating

ArgoCD has a clear security policy. I emailed the maintainers, and they soon confirmed the vulnerability and gave me access to the internal fork.

We started an active discussion for the best mitigation path. I tried to sell an all-in “secure everything”, they offered me numerous alternatives to make the changes less breaking for end users. Remember: even if you have a massive vulnerability in your system, you still have to think about operators for that system and how easy it will be for them to migrate to a patched release. Make sure that process is as smooth as possible; no one wants to spend their holidays applying security patches.

Careful patching

Some easy fixes would have been stopping using parser.ParseUnverifiedaltogether, but often you can’t go that far. If your code must use something as powerful as an insecure parser, be sure to isolate it to a dedicated function and whatever information you leak outside, always consider how it can be misused. In this case, it was generally safe to leak the Issuer claim as it didn’t severely change the further verification, but it inadvertently leaked the Audience claim with it.

ArgoCD takes both approaches: it doesn’t use the Issuer claim directly anymore, and it isolates the Audience claim verification on an extremely well-documented method.

Not having the same level of detail creates a new issue, though: it makes the overall UX worse. Security, frequently, comes along with a worse experience for end users; after all those “something went wrong” messages aren’t there because the website developers hate you, it’s because telling you what exactly went wrong might disclose crucial details to the attacker.

You never know how a security detail can be misused. One’d think that returning the error of “token expired” vs. “token signature isn’t valid” doesn’t leak much, but often claims like these are hard to verify, and they could result in a side chain attack. A better option would be to return the aforementioned “something went wrong” error, but pair it with a completely random request ID. The application can report what exactly went wrong in the logs, having the request ID stored there too.

Now, if the user wants to know what exactly went wrong, they can pass the token to the system operator and the operator—after verifying the legitimacy of the request—can look up the details from the logs.

Wrapping up

The ArgoCD story went well. The maintainers were quick to respond, the patch was fixing all the major issues, and the followup notes allow for an even better security handling (and I can have my two different audiences for people and bots). The important lesson boils down to always verifying your inputs. If that was the case, there would have been so much fewer CVEs.