Blog

New features in Trusty: Historical provenance, new scoring dimensions, and more

Author: Stacklok Editorial Team

6 mins read

Jan 17, 2024

/ Subscribe

We’re continuing to add new features to Trusty, our free-to-use service to help developers vet the safety of open source packages. So far this month, we've added four new features:

Historical provenance, to provide proof of origin for open source packages not yet signed with sigstore
New scoring dimensions to rank the overall safety of an open source package, including provenance and typosquatting likelihood
Improved security checks for possible "starjacking" and "typosquatting" attacks on a package
Malicious and deprecated package warnings, to help you avoid installing unsafe packages

Read more about each of these features below.

Historical provenance information

TL;DR: Trusty can now verify the source of origin for packages that haven’t yet been signed and built using sigstore.

While some package managers like npm now support publishing packages with provenance using sigstore, the majority of packages that you see listed on repos like npm, PyPI, crates, and Maven don’t have verified links back to their source repo. This leaves developers vulnerable to attacks like “starjacking,” in which a bad actor copies metadata from a reputable package—including its source repo—to get developers to install malware (see an example uncovered by jFrog here.)

To help solve for this, the Stacklok data science team built a new method called “historical provenance” to verify proof of origin for open source packages. The method involves mapping historical Git tags in a package’s source repo to the listed versions in the package repository. Our goal here is to use historical provenance to catch and help shut down “starjacking” attacks on open source packages, and help developers use Trusty to verify that the package they’re about to install is what it says it is.

Historical Provenance screenshot — Historical provenance details in Trusty

Check out this demo to learn more about historical provenance, or read our blog post:

New Trusty Score dimensions

TL;DR: In addition to repo and author activity, a package’s overall Trusty Score now takes into account provenance and typosquatting likelihood.

Each open source package listed in Trusty has an overall Trusty Score. This score ranks each package relative to other open source packages, and gives developers signal into whether a package is safe to use. When we launched Trusty, this score was calculated based on repo and author activity, but we’ve always been planning to add more dimensions to make this scoring more robust.

As of this week, the Trusty Score includes new dimensions for provenance and typosquatting. Individual provenance and typosquatting scores are factored into the overall score, and you can click on “View Score Details” to see how the scoring breaks down:

Trusty Score screenshot — Trusty Score details

Here’s a quick overview of how these individual scoring dimensions are calculated (view our docs to see full details).

Repo and Author Activity scores: A relative ranking of the level of activity within the package’s primary repository, and a relative aggregate rank of the top contributors to the repo. These scores are calculated using Principal Component Analysis of a number of features from public GitHub repos, including forks, watchers, subscriber count, public repos and gists, and followers.
Typosquatting: Indicates whether a package is likely to be a “typosquat,” or the practice of malicious actors who give their packages a slightly similar name to a reputable package, with the intention of tricking developers into installing a malicious package.
- For this score, we measure the Levenshtein distance for a package as compared to every other package we have indexed. that package as compared to every other package. This gives us a list of packages with very similar names to the one we are searching on. We then subtract 1 from the score for every similar package with a higher overall score than the current package (this is recursive).
Provenance: Indicates the strength of the link between a published package and its source repository. For provenance scoring, we assign the following scores:

A score of 10 indicates that the package was signed and built with sigstore and GitHub Actions.
A score of 8 indicates strong historical provenance mapping from the package to its source repo.
A score of 5 indicates that the source repo does not have any Git tags, so we are unable to determine any link from the source repo to the published package.
A score of 2 indicates that the Git tags in the package’s listed source repo do not match the published versions of that package on the package manager registry.
- This could indicate that the package is malicious, or it could indicate that Git tags are being used for purposes other than denoting new version releases.

Malicious packages: In addition to these scoring dimensions, if a package is known to be malicious, it is automatically assigned a score of 1.

Improved security checks

TL;DR: Trusty now flags malicious and deprecated packages, and displays a new “Security Checks” box to help you identify potential malicious attacks on a package you’re installing.

To help find and flag possible instances of “starjacking” and “typosquatting”—two common software supply chain attacks—Trusty runs security checks on each package we index.

The first check for Repository Affiliation looks for any packages that claim to be from the same source repo as the package you’re viewing. For each of those packages, we display provenance information to verify the link between the package and the source repo. If no provenance information is available, this could represent a “starjacking” attack.

Repo affiliation checks - screenshot — An example of repository affiliation checks in Trusty

The second check, for Typosquatting attacks, looks for packages that have a very similar name to the package you’re viewing. Typosquatting packages are likely to have low repo activity: based on our data analysis, malicious packages aren’t likely to have many contributors or a long history of commits. That means they’re likely to have low overall scores.

For any similarly named packages listed in this section, we list the overall score (along with provenance information) to help you understand whether this package is likely to be a typosquat.

Typosquatting attack checks - screenshot — An example of typosquatting attack checks in Trusty

Malicious / deprecated package warnings

To make it easier to identify malicious or deprecated packages indexed in Trusty, we now flag these packages with a “Malicious” or “Deprecated” warning. Currently, we ingest malicious package data from Datadog’s Malicious Software Packages dataset and OpenSSF’s Malicious Packages repo.

Deprecated package warning - screenshot — A warning label for a deprecated package in Trusty

What’s next

Trusty is still in the experimental phase, and we’ll continue to add new features to Trusty on an ongoing basis. You can view our public roadmap here. Below are a few features you can expect soon:

Include additional metadata on packages: Provide more information on packages including known vulnerabilities from OSV, license information, and additional information from sigstore.
Show dependencies and dependents of package: List the dependencies included in a package, and which other packages use the package as a dependency (with links to Trusty detailed pages).
Show trend graph of scores over time: Enable users to understand how a package’s score has changed over time.
Expand support to additional languages: Specifically, you can expect to see support for Go packages soon.

Please give us feedback on these features! We want to make sure Trusty is useful for you.

Check out these features now at www.trustypkg.dev, and join us on Discord to let us know what you think!