Trusty is a free-to-use web app that provides data and scoring on the supply chain risk for open source packages.
It's been 10 years since Brendan Burns, Eric Brewer and I spoke at DockerCon and introduced Kubernetes to the world. Here are some of the lessons I learned from building this large-scale, highly complex OSS project.
Happy birthday, Kubernetes! As I get ready to celebrate “KuberTENes” with the broader community in Silicon Valley, it is hard to believe that it has been 10 years since Brendan Burns, Eric Brewer and I spoke at DockerCon and introduced Kubernetes to the world. As I was looking through old material, I came across some pre-work that Joe Beda, a co-founder of Kubernetes, did at Gluecon 2014. He dropped the statistic that Google launched 2 billion containers a week, and announced the “container-optimized VM” as a precursor to Kubernetes. It is amazing how far we have come since then. For those interested in the journey, Honeypot put together a cool documentary series (part 1 and part 2).
I thought that instead of relating trivia and offering a history lesson, it would be more interesting to reflect on the principal lessons learned from the Kubernetes journey, in an attempt to help others who are embarking on similar journeys. We got lucky a lot—I am not claiming credit for unusual wisdom or perspicuity here—but there were a few things that made a real difference to the trajectory of the project. Here is a short distillation of some key lessons from the journey.
This is superficially obvious, but I still find it interesting how hard a time many founders and executives have in saying “no, we don’t/won’t/can’t do that.” When you are building a project, a product, a team, or a company, your identity is often defined more strongly by the things that you say you are not doing, rather than the things that you say you are doing. Focus can be your most powerful ally, and FOMO (fear of missing out) your most hostile adversary.
During the K8s journey, there were several “tipping point” decisions that I think fundamentally shaped the direction of the technology. Perhaps one of the toughest and most significant decisions I participated in driving was the early decision to focus on no more than 100 nodes. We were at a juncture in the project, where we were being presented as “a toy” by more established orchestration technologies that were able to support many thousands of nodes. For me, there were three key factors that drove the need to focus on no more than 100 nodes:
The relative maturity of the platform. Claiming support for more than 100 nodes would suggest we were more mature than we actually were.
Little real-world demand for larger clusters. For example, we knew that something like 98% of OpenStack deployments were <100 nodes.
The consumption model. I was convinced that the consumption model should be “personal borg cell” on Google Compute Engine, and figured that it was in our (Google’s) commercial interest to focus on many smaller clusters, rather than a few large clusters. In the end, this was a pretty important decision for the project, and it took a little grit to make it stick with the engineering team that were rearing to go with at-scale management.
I think we would have lost our way, had we given into the appetite to chase scale too early. I am glad we stuck to this one in the early days.
I think it is safe to say that no single person was ultimately responsible for Kubernetes happening (except perhaps for Urs Hӧlzle, who ultimately approved and funded the project). I like to think of building a team as being like an exercise in metallurgy. Do it right, and you have an alloy that is superior to any of its core elements. Do it wrong, and you end up with something brittle and less valuable. In the case of Kubernetes, thanks to a little focused management and a lot of luck, the team was very special:
The design ethos and good taste of Joe Beda, as evidenced by the original domain model, set us apart in a sea of efforts in the early days.
The creative flair of Brendan Burns, as exemplified by the introduction of the CRD, which fundamentally changed the course of the project and unlocked new use cases that were previously unimaginable.
The attention to detail of Brian Grant, who personally reviewed thousands of PRs and ensured that the API was just right. Brian had most of the significant issue numbers memorized. Now that he is moving on from Google, I am hoping he can reclaim all that storage space. :)
Tim Hockin’s ability to engage and solve tough challenges. Tim’s networking design (IP per pod) was a game changer. He was also the creative talent behind the K8s logo.
Kelsey Hightower’s ability to tell a story and meet the community where it was and help it process the potential of the platform in a way that was timely and relevant. I am so glad he decided to join Google when he did.
Sarah Novotny’s ability to bring a community together; she really emerged as the conscience and heart of the community.
Ville Aikas’ diesel engine-like productivity in getting the early prototypes landed.
Dawn Chen’s ability to tap into past experiences with Borg to drive key elements of the node design forward.
Clayton Coleman, who showed up, chopped wood, carried water, and ultimately brought the richness of Red Hat’s collective experience to the community over time.
The deep diversity of experience, along with the willingness for a wide range of individuals to engage with a project that was larger than any single person’s ambition, truly set us apart in a contested world and was critical to our success. I strongly encourage folks to avoid the trap of self-replication in building teams. Sure, replicating strengths can be good—but you will be better off if you have a team with relatively non-overlapping superpowers.
On the k8s journey, there were definitely moments that were a little overwhelming, due to the complexity and scale of the project. I remember feeling like a bit of an imposter as folks who were working on more established projects like Mesos or Cloud Foundry would ask questions like “What does state management look like for Kubernetes?” or “How are you going to deal with network QoS?”
The pattern we developed of focusing on incremental capabilities (stateless before stateful workload management, low scale before high scale, Linux only before Linux and Windows, etc) was tremendously helpful. We learned that finding the confidence to ship something you are authentically embarrassed about (because it is so early) and working with the community in a closed-loop, iterative manner is a game changer for large-scale, complex system delivery. It’s important to have confidence in the team’s—and one’s own—ability to learn, grow, and overcome obstacles, and bring users along at each step of the journey.
There has been a lot of trauma in the open source world that happens when commercial ambitions collide with community sensibilities. I am not going to rehash many painful and very public recent examples, but I think we were all quite fortunate in that the key players in the Kubernetes ecosystem were all very upfront about their needs and ambitions. In a few critical cases, those ambitions were relatively non-overlapping (at least on the near-horizon).
For the Google team, we were highly motivated to disrupt an ecosystem that we saw converging around Amazon, and with Microsoft emerging as a strong contender. Without an effective go-to-market function for Google Cloud (which has obviously changed in recent years, under the leadership of TK), Kubernetes represented a holding motion. The plan was to intermediate the competition with a technology that we thought would run better on Google Cloud, and Kubernetes by its nature was an effective way to decouple the workload from the public cloud. I thought of it as a way to “buy futures” on workload hosting, and we were pretty upfront about that. We didn’t need to own it; we just needed it to exist and to run extremely well on Google Cloud. Red Hat at the time had their own ambitions with OpenShift, but it felt like the intent was to replatform OpenShift on a more community-aligned technology, than to directly pursue K8s in isolation. These goals were relatively supportive and I think both major players wore their hearts on their sleeves.
I strongly encourage companies and founders in the OSS ecosystem to be open from the beginning about their commercial needs and ambitions, and, most critically, not to surprise their communities down the line with changes. I think communities can be sympathetic to commercial needs as long as you are direct and transparent about those needs.
The parallels between Kubernetes and Linux are interesting. Both are important OSS platform technologies. One abstracts apps from the hardware specifics on a single machine. The other abstracts applications from the environment specifics on many machines. There are parallels in how the type of system mirrors the type of governance that emerged for the two projects that I haven’t seen much talked about, but that is pretty fun to observe.
Just like Linux is built around a single entity (machine), the early governance of the project was built around a single person: Linus Torvalds. The resilience of the community and the overwhelming success of Linux is a testament to both Linus as a leader, and to the Linux Foundation as a supporting organization.
For Kubernetes we didn’t have an obvious “Linus.” Well, that isn’t technically true: we had quite a few people that might have fulfilled that role, but promoting any of them into that singleton position of authority would have alienated others, and other companies. Just as Kubernetes relies on distributed systems patterns to operate, we created a distributed governance practice that relied on consensus building around a core set of individuals. At some point, it was equally clear that we would need to “shard” the governance model to meet the needs of the growing project and community. Perhaps one of the most inspiring and impressive things I have seen in my career journey was watching the Kubernetes “old guard” technical leaders deliberately and with purpose work themselves out of the positions of authority to empower new leaders and shepherd in a more scalable governance model.
I think there is a lesson there for founders and leaders of all types. To build something truly durable and sustainable, design yourself out of the system and prepare the way for new leaders to succeed.
It has been an amazing journey to watch Kubernetes grow and prosper to the point where around half of all the public cloud workloads run on it. So many great moments, so many inspiring people to work with and learn from.
Looking ahead, I aspire to bring many of these learnings into Stacklok, and our new community-oriented technologies (like Minder) that are being built to support software supply chain security. Feel free to follow me on LinkedIn and hit me up if you want to talk supply chain security.
Craig McLuckie
CEO
Craig McLuckie is Stacklok's CEO and the co-creator of the open source project Kubernetes.