Juho Snellman's Weblog

Web Environment Integrity vs. Private Access Tokens - They're the same thing!

Posted on 2023-07-25 in General

I've seen a lot of discussions in the last week about the Web Environment Integrity proposal. Quite predictably from the moment it got called things like "DRM for the web", people have been arguing passionately against it on HN, Github issues, etc. The basic claims seem to be that it's going to turn the web into a walled garden, kill ad blockers, kill all small browsers, kill all small operating systems, kill accessibility tools like screen readers, etc.

The Web Environment Integrity proposal is basically:

A website can request an attestation from the browser
The browser forwards the attestation requests to an attester
The attester checks properties like hardware and software integrity
If they check out, the attester creates a token and signs it with its private key.
The attester hands off the signed token to the browser, which in turn sends it to the website.
The website checks that the token was signed by a trusted attester

Here's a funny thing I suspect few of those commenters know: A very similar mechanism already exists on the web, and is already deployed in production browsers (Safari), operating systems (iOS, OS X), and hosting infrastructure (Cloudflare, Fastly). That mechanism is Private Access Tokens / Privacy Pass.

Here's what PATs (as deployed by Apple, and on by default) do to the best of my understanding:

A website can request an attestation from the browser
The browser forwards the attestation requests to an attester
The attester checks properties like hardware and software integrity.
If they check out, the attester calls the website's trusted token issuer
The issuer checks whether to trust the attester and whether the information passed by the attester is sufficient, and then issues a token signed by its private key
The attester hands off the signed token to the browser, which passes it to the website.
The website checks that the token was signed by a trusted token issuer

This launching was hailed in the tech press as a win for privacy and security, not as an attempt to kill accessibility tools or build a walled garden. [1]

You might notice that the basic operating model of the two protocols is almost exactly the same. So is their intended use. From the "DRM for websites" perspective, I don't think there is a difference.

With both WEI and PATs, the website would be able to ask Apple to verify that the request is coming from a genuine non-jailbroken iPhone running Safari, and block the ones running Firefox on Linux. And in both, the intent is not for the API to be used for that kind of outright blocking.

Neither lists e.g. checking whether the browser is running an ad blocker extension as a use case. Both would have just the same technical capabilities for making that kind of thing happen, by just having the attester check for it, and I bet that in both cases the attester would be equally unmotivated in actually providing that kind of attestation.

It's also not that PATs would somehow make it easier for people to spin up new attesters for small or new platforms. Want to run your own attester for PATs? You could, but the issuers you care about will not trust it. [2]

Now, the technologies aren't quite identical, but the distinctions are subtle and would just matter for exactly the kind of anti-abuse work that both of the proposals were ostensibly meant for. The big one is the WEI proposal including the ability to content-bind the attestation to a specific operation. It's a feature anyone trying to use a feature like this for abuse prevention would think is needed, but that adds no power to the theorized "DRM for the web" use case. There is also a more obvious difference between the two, with whether the attester and issuer are the same entity or split seems. But that too is irrelevant in the discussion on how the technology could be misused. [3]

In principle there could also be differences in the exact things that the APIs allow attesting for. But neither standard really defines the exact set of attestations, just the mechanisms.

Given the DRM narrative would have worked exactly the same for the two projects, why such a different reception? I can only think of two differences, both social rather than technical.

One is that the PAT (and related Privacy Pass) draft standards were written in the IETF and are dense standardese. There was no plaintext explainer. Effectively nobody outside of the internet standardization circles read those drafts, and if they had they wouldn't have known whether they needed to be outraged or not. The first time it actually broke through to the public was when Apple implemented it.

The other is the framing. PATs were sold to the public exclusively as a way of seeing fewer captchas. Who wouldn't want fewer captchas? WEI was pitched as a bunch of fairly abstract use cases and mostly from the perspective of the service provider, not for how it'd improve the user experience by reducing the need for invasive challenges and data collection.

This isn't the first time I've seen two attempts at a really similar project, with one getting lauded while the other gets trashed for something that's common to both. But it is the one where the two things are the most similar, and it feels like it should be instructive somehow.

If the takeaway is that standards proposals should be opaque and kept away from the public for as long as possible, before being launched straight to prod based on a draft spec, that'd be bad. If it's that standard proposals should be carefully written to highlight the benefit for the end user, even starting from the first draft, that's probably pretty good? And if it's that only Apple can launch any browser features without a massive backlash, it seems pretty damn bad.

[1] Just to be clear, the one significant HN discussion on PATs had similar arguments about it being DRM, so my claim is not that absolutely everyone loved PATs. But it didn't actually get traction as a hacker cause celebre, and as far as I can see the general media coverage was broadly positive.

[2] What's the process for getting Cloudflare or Fastly to trust a non-Apple attester anyway? I can't find any documentation.

[3] The split version seems kind of superior for deployment, since it means each site needs to only care about a single key (their chosen issuer). This makes e.g. the creation of a new attester a lot more tractable. You only need to convince half a dozen issuers to trust your new attester and ingest the keys, not try to sign up every single website in the world one by one.

A monorepo misconception - atomic cross-project commits

Posted on 2021-07-21 in General

In articles and discussions about monorepos, there's one frequently alleged key benefit: atomic commits across the whole tree let you make changes to both a library's implementation and the clients in a single commit. Many authors even go as far to claim that this is the only benefit of monorepos.

I like monorepos, but that particular claim makes no sense! It's not how you'd actually make backwards incompatible changes, such as interface refactorings, in a large monorepo. Instead the process would be highly incremental, and more like the following:

Push one commit to change the library, such that it supports both the old and new behavior with different interfaces.
Once you're sure the commit from stage 1 won't be reverted, push N commits to switch each of the N clients to use the new interface.
Once you're sure the commits from stage 2 won't be reverted, push one commit to remove the old implementation and interface from the library.

Computing multiple hash values in parallel with AVX2

Posted on 2017-03-19 in General

I wanted to compute some hash values in a very particular way, and couldn't find any existing implementations. The special circumstances were:

The keys are short (not sure exactly what size they'll end up, but almost certainly in the 12-40 byte range).
The keys all of the same length.
I know the length at compile time.
I have a batch of keys to process at once.

Given the above constraints, it seems obvious that doing multiple keys in a batch with SIMD could speed thing up over computing each one individually. Now, typically small data sizes aren't a good sign for SIMD. But that's not the case here, since the core problem parallelizes so neatly.

After a couple of false starts, I ended up with a version of xxHash32 that computes hash values for 8 keys at the same time using AVX2. The code is at parallel-xxhash.

I've been writing ring buffers wrong all these years

Posted on 2016-12-13 in General

So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.

It was just surprisingly annoying to write, due to reasons we'll get to in a bit. After giving it a bit of thought, I realized I'd always been writing ring buffers "wrong", and there was a better way.

Ratas - A hierarchical timer wheel

Posted on 2016-07-27 in General

Last week I needed a timer wheel for a hobby project. That's a data structure that's been reimplemented over and over in the last three decades, but for various reasons I couldn't get excited by any of the freely available ones. Obviously this means that one more implementation was needed, hence Ratas - a hierarchical timer wheel. Unfortunately my vacation ran out before I could get back to the original project, but that's the nature of yak shaving.

In this post I'll first explain briefly what timer wheels are - you might want to read one of the references instead if you've got the time - and then go into more detail on why I wrote a new one.

json-to-multicsv - Convert hierarchical JSON to multiple CSV files

Posted on 2016-01-12 in General, Perl

Introduction

json-to-multicsv is a little program to convert a JSON file to one or more CSV files in a way that preserves the hierarchical structure of nested objects and lists. It's the kind of dime a dozen data munging tool that's too trivial to talk about, but I'll write a bit anyway for a couple of reasons.

The first one is that I spent an hour looking for an existing tool that did this and didn't find one. Lots of converters to other formats, all of which seem to assume the JSON is effectively going to be a list of records, but none that supported arbitrary nesting. Did I just somehow manage to miss all the good ones? Or is this truly something that nobody has ever needed to do?

Second, this is as good an excuse as any to start talking a bit about some patterns in how command line programs get told what to do (I'd use the word "configured", except that's not quite right).

The most obsolete infrastructure money could buy - my worst job ever

Posted on 2015-09-01 in General, History

Today marks the 10th anniversary of the most bizarre, and possibly the saddest, job I ever took.

The year was 2005. My interest in writing a content management system in Java for the company that bought our startup had been steadily draining away, while my real passion was working on compilers and other programming language infrastructure (mostly SBCL). One day I spotted a job advert looking for compiler people, which was a rare occurrence in that time and place. I breezed through the job interview, but did not ask the right questions and ignored a couple of warning signs. Oops.

It turned out to be a bit of an adventure in retrocomputing.

Updated zlib benchmarks

Posted on 2015-06-05 in General

Last year I wrote a small benchmark suite to benchmark the various zlib optimization forks that were floating around. There's a couple of reasons to update those results. First, there were major optimizations added to the Cloudflare fork. And second, there's now a new entrant, zlib-ng which merges in the changes from both the Intel and Cloudflare versions but also drops support for old architectures and cleans up the code in general.

I'll write a bit less commentary this time, so that the results will be easier to update in the future without a new post. The big change compared to the 2014-08 results is that the Cloudflare version is now significantly faster particularly on high compression levels, but there are smaller improvements on all compression levels. Except for compression level 1, it seems like the preferable version now for pure speed.

"It's like an OkCupid for voting" - the Finnish election engines

Posted on 2015-05-11 in General

Have I ever told you about the time I built an "OkCupid for elections" for the communists?

No? That's strange, I tend to get good mileage out of that story during election season. Unfortunately for the story to make any sense, you'll need a bit of absolutely fascinating background information on how elections work in Finland, and especially how websites that tell people whom to vote for became an integral part of it.

Can't even throw code across the wall - on open sourcing existing code

Posted on 2015-03-19 in General

Starting a new project as open source feels like the simplest thing in the world. You just take the minimally working thing you wrote, slap on a license file, and push the repo to Github. The difficult bit is creating and maintaining a community that ensures long term continuity of the project, especially as some contributors leave and new ones enter. But getting the code out in a way that could be useful to others is easy.

Things are different for existing codebases, in ways that's hard to appreciate if you haven't tried doing it. Code releases that are made with no attempt to create a community around it and which aren't kept constantly in sync with the proprietary version are derided as "throwing the code across the wall". But getting even to that point can be non-trivial.