Trusty is a free-to-use web app that provides data and scoring on the supply chain risk for open source packages.
The rise in cyberattacks targeting software development processes and the subsequent exploitation of software supply chain vulnerabilities have brought questions around supply chain vulnerability to the forefront. I would like to provide some perspective on the issues at hand, and provide some food for thought on how we, as a broader community, might look to address those issues.
Software supply chain vulnerabilities are not a new phenomenon. Whether it is open source or proprietary software, some of the most significant exploitations in the history of software can be traced back to the software supply chain.
Three come to mind:
One of the more shocking cybersecurity incidents that isn’t talked about much these days is the 2017 Equifax data breach where attackers exploited vulnerabilities in Apache Struts that existed in an externally managed system Equifax depended on and accessed 200,000 credit card details. The impact was dramatic (erasing 30% of Equifaxes market cap overnight). It certainly raised awareness of the criticality of Open Source software maintenance and galvanized organizations to start really thinking about the open source supply chain problems, but it took a few more years for things to really come to a head.
The SolarWinds incident of 2020 where malicious actors compromised the SolarWinds build system and the impact on 18000 organizations and several U.S. government agencies is difficult to directly quantify. The opacity of commercial software motivated the Biden administration to institute executive order 14028 which is driving the adoption of SBOMS (software bill of materials). The broader community has rallied around presenting a inventory of software, and this certainly catapulted projects like sigstore to prominence (which was actually started a little before the Solarwinds incident was discovered)
More recently Log4j vulnerability (Log4Shell) had a massive and sustained impact on the industry. One public sector group reported investing 33,000 human hours (for just one US cabinet agency!) in mitigation costs. The fact that this is understood and well known, but enterprise organizations continue to deploy instances to production deeply highlights how much work needs to be done around hardening the controls in production pipelines.
Software supply chain vulnerabilities typically arise due to weaknesses in the development, distribution, or maintenance of software components. These are often talked about, but it will take significant effort at the community level to address some of the more complex and sophisticated attacks -
Compromised Third-Party Components: Organizations often rely on third-party libraries, frameworks, and dependencies to accelerate development, but obviously, these components may themselves contain vulnerabilities that can be exploited. These could be simply unpatched libraries that contain CVEs, in which case it is a question of paying down operating debt. It could however also be a case of commonly used libraries that are deep in the dependency chain for software development and have relatively little oversight or scrutiny, and may be targeted by APT actors.
Insecure Development Practices: Beyond the obvious best practices that engineers are trained to focus on (e.g. validate your inputs; don’t leave yourself open to XSS attacks) how people operate is increasingly important. Lack of rigor around developer credential management (not having two factor authentication enabled), not having mature merge controls in place, etc will often leave projects and teams open to exploitation. Increasingly teams need to think carefully about how they work, because a weakness in process can lead to substantial impact in the case of exploitation.
Insider Threats: Internal actors with privileged access can intentionally or inadvertently introduce vulnerabilities into the software supply chain. Disgruntled employees or those susceptible to social engineering attacks pose a significant risk. Proper access controls, segregation of duties, and employee awareness programs can help mitigate this threat. It is worth noting that insider threats may be both within an organization, or indeed within an open source community.
Weak Authentication and Authorization: Weak or misconfigured authentication mechanisms can enable unauthorized access to software components, build systems and continuous integration/continuous delivery services compromising the entire supply chain. A slew of recent attacks where GitHub OAuth tokens were acquired and used for malicious purposes showcases the criticality of a well thought through access and authorization model.
Prioritization and Understanding ‘Reachability’ of Packages. One of the critical challenges in dealing with software is the overwhelming volume of potential ‘false positives’ with respect to a large software deployment. An organization may ship a responsibly sourced and well put together software package, but within a few months a number of new CVE’s will be flagged and it is easy for an enterprise that has deployed that software to ‘fall behind’ on updates, even if the publisher of the software is pushing updates periodically. As such effective and realistic prioritization is critical and there is interesting work being done in the industry to drive a view of ‘reachability’ of exploits to support prioritization of organizations to focus on mitigating that which is known to be vulnerable first as a priority.
Everything I wrote in the previous section is likely pretty well understood. Some issues are truly daunting undertakings (e.g. how can you get to a point where you can look at a binary and be confident it hasn’t been ‘solarwinds-ed’, but if we squint a little bit and look at the evolution of things like SBOM technologies, the emergence of sigstore as a tamper resistant system of record for provenance information, the use of cloud based services (where some of the headache is handled by the cloud providers), more deterministic packaging and deployment approaches (per the cloud native ecosystem) and increasingly powerful SCA and static analysis tools in the hands of our developers we can imagine a path through with a bit of work.
Unfortunately the world just changed. Generative AI is a wonderful technology for reducing the barrier to entry for pretty much anything - authoring, design and most coding. This is going to create some truly unique challenges that we, as an industry, really need to pay attention to:
Phishing just got easier; a lot easier. When you think about the published exploitation of Heroku, CircleCI and Travis CI, we see phishing as a common attack vector. It is plausible that the tools to phish have just got radically better. When you can easily emulate a complete website, or your boss’s voice on a voicemail, or tone in a text message, etc one has to imagine that the volume of phishing attacks will go up. The world is an imperfect place, so we are going to have to invest in driving determinism of development practices and the use of technologies like Rekor (the immutable tamper resistant ledger) is going to play an important role in forensics down the road. We are also going to have to really focus on hygiene around token management (short lived, minimally scoped tokens), branch controls and operational partitioning (minimal production access for non-essential personnel).
Citizen coders suddenly developed super-powers, as did everyday developers. People can suddenly produce code without understanding and get that code into ‘shadow-ops production’ more easily than ever before. Even trained developers will be lulled into ‘tab-complete hypnosis’ and get ahead of their understanding or at least ability to fully scrutinize what they are producing and deploying. This would be okay if AI were precise, but it is far from that. Enterprises will need to invest in the education of employees immediately to avoid deep challenges down the road, and add controls in the path to production to increase the level of scrutiny code is getting.
The generative AI models themselves have been proven vulnerable to attacks. This is too big a topic to dig into right now, but is a good topic for future posts. Understanding the provenance of training data, and model retraining will be key. Whatever has been said about software supply chain management will apply just as much to generative AI models and generative AI produced code.
The supply chain security problem is broad, meaningful and complex. I do not believe that any one company will have the resources, the capabilities and the acumen to address every issue that I have surfaced. I do however believe that improving the posture of developers from a security perspective has never been more important, and that enterprise organizations would do well to pay attention to the changing definition of a developer as generative AI makes imperative code accessible to new groups of individuals. I also believe that it is going to take a village and that open source will be a critical element of any solution and couldn’t be more excited to be on this journey with my fellow Stacklok employees. Please subscribe to notifications below, or follow us at @stacklokhq on twitter.
Craig McLuckie
CEO
Craig McLuckie is Stacklok's CEO and the co-creator of the open source project Kubernetes.