Triangle of Secure Code Delivery
Secure code delivery is the problem of getting software from its author to its users safely, with a healthy dose of mistrust towards the author and everything else in between.
We want to make sure that no attacker can modify the software as users download it in order to backdoor it or take control of their systems. More than that, we want to make it hard for the software's actual author to insert backdoors and selectively target users. This is important, because even if the author is absolutely trustworthy, they still might have been compromised, and with potentially millions of systems pulling code from them, we ought to have some sort of protection.
It's a difficult problem, one we haven't come close to solving. Especially so on new platforms like the Web, where entire applications are re-downloaded every time they are run.
The Triangle
Here's a Triangle, otherwise known as a list of three things, that I conjecture are necessary and sufficient for code delivery to be secure. The three points of the triangle are:
-
Reproducible Builds:
Given the application's source code, it should be possible to reproduce the distributed package exactly, down to contents that are known to vary benignly, such as build timestamps.
This property is important for auditing. A developer can sign both the source code and distributed binary package, but how does the user (or, more likely, a security auditor) know the source code actually represents the binary? To be sure, it has to be possible to re-create the binary package from the source exactly, or at least without unexplained differences. This provides some defense if the software's developers turn malicious or are successfully attacked.
-
Userbase Consistency Verification:
Users of the software should be able to check that the package they received is identical to the one that all other user received in a peer-to-peer (or otherwise decentralized) fashion. These packages should be available permanently in a public record.
This is the most important of the three properties. Simply put: Everyone gets the same thing. If you can guarantee that everyone gets an identical copy of the software, then it becomes impossible to hide a targeted attack. If an attacker wants to backdoor one user's software, they have to backdoor every user's software. This greatly increases the attacker's risk of being detected.
-
Cryptographic Signatures:
The software package, source code, and patches (changes) should be cryptographically signed by the upstream software source (i.e. the developers).
This serves to establish an anchor of trust to a person or organization responsible for maintaining the software. Without this property, a window of vulnerability exists before the software gets distributed widely enough for the Userbase Consistency Verification to be effective.
I conjecture that these three properties, if implemented correctly, are sufficient to disincentivize both large-scale attacks (i.e. the NSA wants to put a vulnerability in everyone's copy of Tor) and localized targeted attacks (i.e. the NSA wants to compromise a single user's software download to take control of their system).
Having just two of these properties is not enough:
-
Without Reproducible Builds:
Without Reproducible Builds, the software developer can be compromised and backdoors can be inserted into binaries prior to signing. With the binary distributed widely and in the public record, detection is still a risk for the attacker, but only if lots of people are looking very closely. With reproducible builds, detection becomes immediate by comparing the build-verified source code to the previous version.
-
Without Userbase Consistency Verification:
Without Userbase Consistency Verification, localized targeted attacks are much easier. This is especially true when the software developers themselves are malicious (or controlled by the NSA), and they want to serve backdoored copies to some users, but clean copies to most users.
-
Cryptographic Signatures:
Without this property, there's a window of opportunity for an attack to happen between the time when a new version of the software is released and when it becomes widely publicized and the Userbase Consistency Verification becomes effective. Any Userbase Consistency Verification system would probably depend on signatures simply in order to know who is authorized to release the next version of the software.
The Future
With these three properties in mind, can we build a secure code delivery system?
Cryptographic signatures are already available for most popular software. The Gitian project is making progress on Reproducible Builds and supports a limited kind of Userbase Consistency Verification. The Bitcoin cryptocurrency, being a decentralized append-only record, is evidence that full-scale Userbase Consistency Verification is possible, but can we make something reliable and easy to use for software? Perhaps it could work the way Perspectives or Convergence do for the SSL Certificate Authority system.
Can we build a secure code delivery system for the web, too? If we had a one built into our browsers, security would be a whole lot better. There would be no more compromised websites serving malware, and we could finally bring usable crypto, like LastPass, Cryptocat, miniLock, and GlobaLeaks to the masses.
I'm convinced that code delivery is the biggest challenge, with the most practical consequences, that we're facing today. Let's give it the attention it deserves. With these three principles, we can see a way forward.