Trusty provides a free-to-use service with scoring and metrics about a package’s repo and author activity.
Minder is an open source platform that helps project owners build more secure software and prove that what they’ve built is secure.
External dependency risk has long been evaluated based on the presence of known CVEs—but a lack of them could just mean the software is unknown or unscrutinized. It’s time to think differently about how we vet the safety of open source software.
The world of organic chemistry is orderly. Subatomic particles form atoms; atoms form compounds; compounds form proteins; etc etc. In the software world, sadly, things don’t build neatly on one another. It is a hugely complicated, tangled world, where the software you use has dependencies (sometimes 100s or 1000s), which have dependencies on other things, and so on. Something like 90% of shipped software is open source technology, integrated through a package manager (or language equivalents).
I often think about this xkcd cartoon (shown above) that depicts technical infrastructure as a layer of blocks supported by a few projects that are absolutely crucial, but are maintained through the goodwill of a very few under-appreciated souls. It’s like a technical Jenga that brings down the internet when the wrong block is “pulled,” (e.g., a critical piece fails).
At the human level, we need to understand situations where individuals are doing unlikely work to support and celebrate them. At the system level, we need to understand what those projects are so that we can work to avoid single-point-of-failure risk over time.
What truly fascinates me is that we have come to use the CVE—or more importantly, the absence of the CVE—as a primary indicator of whether software is risky. We have also allowed the CVE to become a principal source of irritation for that poor maintainer in Nebraska. The absence of a CVE as a mechanism to drive trust is entirely nonsensical for a number of reasons:
The absence of a CVE doesn’t mean the code is good. It may mean it is good, but it could also mean that it is simply unscrutinized. It could also mean that it was produced as a fork of something authentically great by a malicious actor from the very latest source code, with some extra malicious additions, and therefore the most terrible kind of bad. The absence of a CVE won’t tell you whether something is trustworthy, unscrutinized, or truly vicious and just very fresh.
The presence of a CVE increasingly doesn’t mean much either. Tons of packages are being flagged because they pull through a transitive dependency on something that has a high CVSS ranking associated with it, despite the fact that in the context it is being used it might be entirely safe. The problem is that there is no easy way to explain this to the scanning systems that are just going to keep going off, and the incentives simply don’t exist to make some maintainers do something that has absolutely no real world impact. All it does is start false alarms going off in abstract places downstream.
Not all CVEs or vulnerability databases are equal. The relevance and quality of CVEs can vary quite considerably. Some are well framed and are clearly actionable. Others not so much. In talking to businesses we often hear that it is becoming beyond a little confusing since different scanning tools return different results. The signal to noise ratio is just not where it needs to be.
With this in mind, I believe it is past time to look past the CVE as the primary (and sometimes only) signal around the relative viability of a dependency, and start to look at what really makes open source communities – people. Or more specifically, the activities of those community-minded developers who over time have built the foundations of the software we all rely on. We need to find ways to start weighting the activity of actual human beings (not just in the now, but the historical context also) and making that information more intrinsically accessible to developers as they are making decisions about what to use and what not to use. Please don’t get me wrong: I am not suggesting everyone turn off their scanners, but we owe it to ourselves to think more clearly and just do better.
To start, we could focus on exploring the relationship between the people building open source software and the way they go about building it. We could examine and use the context that they create through months and years of working in the open source community, and get a much stronger sense as to whether they are acting in good faith. Relying on the context that a community creates over months and years through their work in an open source community, we are going to be able to have a much stronger sense as to whether they are acting in good faith or not if they participate in a different context.
Ironically, this is more or less what a lot of us do today. We just all do it in slightly different ways and generally quite informally. When assessing a dependency we will go off and take a look at not just the package, but all kinds of activity around the package. Who produced it? How often is it updated? How well are the maintainers supporting the community? What other things are the maintainers of the package known for? These are critical questions.
The most critical question for evaluating risk, though, is this one: How can you deterministically establish how and where a package was produced? How do you deterministically establish how and where a package was produced?
You could look at metadata in the package repository. But that data is typically self-reported, and in some cases absent. There is literally nothing stopping someone from claiming that a package they published was sourced from the tensorflow environment, for example (this type of exploit has been labeled ‘starjacking’ fwiw). Without reproducible build technologies, there is no simple way to prove it didn’t.
We need useful information about software to be discoverable in a public format, in a tamper-resistant form factor, with strong proof of ownership. That's the value of Sigstore.
Sigstore, an open source project that handles signing, verification, and provenance checks, represents the most promising way to address this problem (though obviously I am a little biased here, since my co-founder, Luke Hinds, started the project). It is also worth recognizing the Golang community, which is navigating this challenge quite well through their sophisticated dependency management system, and other organizations like Tidelift, who are doing good work in the community to support the open source ecosystem.
At the end of the day, knowing more about how software is produced and about who is producing it is going to serve you far better than the ostensible absence of a CVE which may mean nothing at all. The only way to really, truly know who produced something and how it was produced is to have a publicly accessible record in a tamper resistant ledger that demonstrates proof of origin in a cryptographically provable way.
At Stacklok, it is our belief that the open source software supply chain represents one of the greatest technical treasures and sources of human innovation. We also see it as a most tantalizing target for sophisticated hostile actors, in a world that is getting darker.
We believe that malware injection into the open source software supply chain is the most significant cyberthreat facing the software industry.
For Stacklok to serve the community and our eventual customers, we need to make two things happen:
Help developers make better assessments about the dependencies they are using
Help developers make clear assertions about the software they are building
The transition from passive hostile actors (who look for a way in), to active hostile actors (who are actively trying to create a way in) requires a sea change in how we think about not just security, but safety at every level. I use the word “safety” quite deliberately, because being hacked isn’t the only bad thing that can happen in your dependency chain.
We are thrilled to be on this journey, and looking forward to sharing some exciting news very soon about what we have been up to. We'll be at GitHub Universe on November 7-9; if you'll be there, come say hi and see what we're up to!
Craig McLuckie, Stacklok's CEO, is a seasoned startup founder with substantial experience in the open source and cloud computing realms. Prior to Stacklok, he co-founded Heptio to support enterprise adoption of Kubernetes, the open source container orchestration system he co-founded while at Google. In November 2018, Heptio was acquired by VMware, where Craig transitioned to VP of R&D, establishing VMware's Tanzu cloud native computing business and managing a large engineering team. In addition to co-founding Kubernetes while at Google, Craig also established the Cloud Native Computing Foundation and spearheaded the development of Google Compute Engine, alongside Heptio and Kubernetes co-founder and CTO Joe Beda.