Stacklok Insight is a free-to-use web app that provides data and scoring on the supply chain risk for open source packages.
I often see the same questions come up for Rekor, sigstores transparency log, such as what its primary purpose is or what function Rekor serves within Sigstore's ecosystem. Therefore, I thought it would be useful to delve into the topic, explore Rekor's role, and address some common misconceptions.
If you don’t know Sigstore itself it is a group of open source projects designed to make the signing and attestation of software easily accessible. Stacklok are key contributors and supporters of the sigstore project.
This article will deep dive into rekor, Sigstore’s transparency log.
Merkle trees have proven to be highly valuable algorithms in various technologies. They were initially conceptualized by Ralph Merkle in the late 1970s. If we were to summarize their primary purpose, it would be to ensure the integrity and consistency of data in distributed systems.
A Merkle tree follows a binary tree structure and utilizes a simple operation called a hash extend function. Essentially, a Merkle tree is constructed by recursively concatenating pairs of hashes (leaf nodes) and then hashing the resulting nodes until a single root hash is obtained.
Each leaf node in the tree usually represents a data element, while the intermediate nodes store the hash values of their child nodes.
Merkle trees are leveraged in several popular existing technologies..
One well-known application that makes use of Merkle trees is BitTorrent. In the case of BitTorrent, every file is divided into 2MB parts as the data blocks. Each file part is used to compute a hash (using SHA-1) and then used as a node to construct a tree. The tree construction continues until a final root hash is obtained. Each leech then downloads these file parts and performs an "inclusion proof" to ensure the file parts are free from tampering.
In blockchain technology, Merkle trees play an essential role verifying transactions and ensuring the integrity of the blockchain itself. Each single block within a blockchain contains its own Merkle tree, where the leaf nodes represent individual transactions and the root node, or Merkle root, signifies a summary of all transactions.
If a transaction is (maliciously or via system fault) altered, the change propagates up to the Merkle root, causing a mismatch with the original root, thereby flagging an invalid transaction. This structure also makes it easier to validate a particular transaction's existence without scanning all transactions; a path from the transaction to the root, known as a Merkle proof, is sufficient.
Here is an interesting snippet, when I first had the idea of developing Rekor I originally attempted to do so with blockchain, but I hit some key aspects of blockchains design that made it the less optimal technology, a lot of this is covered in a blog piece,I wrote with professor Santiago Torres-Arias (a fellow sigstore technical steering committee member).
You may have heard of ‘Certificate Transparency Logs’ before’ or ‘CT Logs’ to use their abbreviated term. CT Logs are designed to provide public auditing and monitoring of digital certificates issued by Certificate Authorities (CAs). The CT logs are a public record of these certificates, designed to improve security by allowing the detection of incorrectly or maliciously issued certificates. Certificate Authorities submit certificates to CT logs and receive Signed Certificate Timestamps (SCTs) in return, that provide evidence that a certificate has been accepted and logged.
A Transparency Log has the properties of being ‘immutable’ and ‘read-only’, which is achieved by use of a Merkle tree. In their paper "Efficient Data Structures for Tamper-Evident Logging" (2009), Scott Crosby and Dan Wallach introduced the idea of using a Merkle tree to store a verifiably append-only log. In the design of Certificate Transparency (RFC-6962), Ben Laurie, Adam Langley, and Emilia Kasper adopted the verifiable and transparent log that was introduced by Scott Crosby and Dan Wallach.
Rekor leverages an existing open source project called Trillian. In a nutshell, Trillian implements a ‘Transparency Log’ which is the same back end used by both Certificate Transparency Logs and Rekor.
Rekor stores a little more than an X509 cert chain though, it constructs a tree using manifests or to use the term we have in Rekor, types.
A common type is the HashedRekord, we can see an example here:
The main elements of interest our as follows:
* Data -> Hash: The SHA256 digest of the artifact that has been signed.
* Signature -> Content: A base64 encoded signature using the ECDSA algorithm
* Signature -> publicKey: The X509 signing certificate used as part of a keypair to sign the artifacts digest
Let’s take a look at the signing certificate itself
Here we can see that the cert was issued by sigtore.dev (the public service of sigstore).
We can also see looking at the SAN, an OIDC scope is included which is my email (luke at stacklok.com), along with details of the iDP (accounts.google.com)
When this certificate was originally sent to rekor for ‘inclusion’, it will have undergone a few checks:
Was the Root CA issuer sigstore.dev (verify the certificate chain)
Does the signature verify correctly (public key / signature -> digest)
If these checks pass, the certificate will be used as a leaf node within the merkle tree.
Key aspects of rekor that pertain to secure software supply chain are transparency and observability. By having an immutable read-only store of signed artifact metadata, we have an observable record of the software supply chain. This is especially useful for several reasons.
We can understand the blast radius of an attack.
Earlier on, we could see that I signed a record using my ODIC credentials (luke @ stacklok.com). Well let’s say my account is compromised, we can then consult the log between specific timestamps to establish if any other artifacts may have been signed using my credentials.
We can also apply the same premise to other scopes handled by sigstore and rekor. For example, using in-toto style attestations, we can see the originating source of origin for an artifact. Let’s take an NPM package as an example, in fact let’s use the sigstore-js node package, used to provide provenance of the NPM registry.
https://search.sigstore.dev/?logIndex=20766328
Here we can get many useful details on the environment in which the package was built and released:
This way we can be sure the package we received was built within the expected repository / source of origin.
I think one of the key misconceptions there is with rekor, is the premise that if something resides within rekor it is ‘good’ or can be ‘trusted’. This is a false premise. We actually want bad things in the log and expect them to be there, as they are then observable and in the open, and no longer dark matter that cannot be scrutinized and assessed in a transparent manner. Having this observability makes rekor as a very useful tool to understand the blast radius of a software supply chain attack.
The other misconception is that we store artifacts within rekor, making it like a cryptographic EC2 bucket. You can now see this is not true, it is in fact a manifest that we store, which maps to an artifact by means of a digest. You do not even have to send the object you signed to rekor, since we have a type called a HashedRekord which allows you to just send the digest as a representation of the artifact.
This is also false, no company has overall influence or ownership of sigstore and the transparency log rekor. The project is under the governance of the OpenSSF and many companies (stacklok included) are part of the project. The project originated at Red Hat during my time there, with Red Hat donating ownership to the Linux Foundation, a move that was made to ensure the project would be governed in the community's interest and not any individual company. It has seen huge contributions from experts at companies such as Google, GitHub, Chainguard and of course Stacklok, but its success rests upon it being a community collaboration to build open source software.
Luke Hinds
CTO
Luke Hinds is the CTO of Stacklok. He is the creator of the open source project sigstore, which makes it easier for developers to sign and verify software artifacts. Prior to Stacklok, Luke was a distinguished engineer at Red Hat.