Stacklok Insight is a free-to-use web app that provides data and scoring on the supply chain risk for open source packages.
In software development, secrets refer to any sensitive data that should be kept private. This commonly includes data used for authentication purposes, like:
API keys
Access tokens
Database credentials
SSH keys
Certificates
Usernames and passwords
In this post, we’ll explore the tools, settings, and best practices you can and should adopt in GitHub (nearly all of which are free) to keep these kinds of secrets secret.
Clearly, protecting the most sensitive data in your code should be a top priority. Yet a 2023 GitGuardian analysis of more than one billion public GitHub commits found 10 million occurrences of hard-coded (unencrypted) secrets—a 67% increase from 2021. More than 80% of those secrets were exposed through personal repositories, and were largely corporate secrets. As GitGuardian points out, credentials theft is the most common cause of data breaches—malicious actors frequently scan repositories for secrets that can be used to breach an application.
In a recent test by Stacklok’s security researchers, it took less than 10 minutes for attackers to identify an AWS access token in a public GitHub repository and try to use it.
A recent example of a real attack that was disclosed publicly is Slack’s late 2022 security incident, in which malicious actors gained access to Slack employees’ GitHub access tokens and used them to gain access to Slack’s GitHub repository and download private code.
Why are so many secrets getting leaked? Usually from simple accidents. For example, it’s common for developers to start working on new features by using hardcoded access tokens, with the aim to move them into a secrets management platform before deployment. But it’s also common to make small, iterative commits during early prototyping and development. So it’s trivial to accidentally push a set of changes with a token in them, even if the final code changes do make correct use of the secrets management platform. Git doesn’t forget about those early changes that contain the secret, and attackers don’t either.
Another challenge is the scope of the problem in a modern application. Developers are using cloud platforms, GitHub access tokens, credentials for relational databases, document databases, vector databases, and more. And sometimes credentials don’t look like credentials; so it can be hard to reason about something that looks like a UUID in a poorly named variable. It might be an ID that represents some development schema, or it might be an access token.
GitHub has a number of tools, settings, and recommended best practices to help you manage secrets and prevent leakage. Nearly all of these are free for public repositories. Using these tools—along with following general best practices like using a secrets management platform, regular key rotation, and tokenization—can help you keep your secrets safe.
GitHub’s secret scanning feature scans your entire Git repository, including open and historical issues; pull requests; wikis; and GitHub Discussions for all branches in your repository. It looks for strings that match patterns used by common service providers, like Amazon AWS or Azure) and GitHub partners, like the npm registry. When a secret is discovered, GitHub will automatically create a security alert in your repository.
You can enable secret push protection at the user, repository, organization level, to prevent yourself and others on your team from pushing code that contains a detected secret. When that happens, the contributor will be notified that they need to remove the secret in order to proceed. This helps make sure that if a developer accidentally commits a secret, it never lands in their GitHub repository; it will be rejected when they try to push any branch, before they’re even able to open a pull request.
GitHub enables secret push protection for users by default, but users can disable this, so we recommend that project owners enable this at the repository or organization level.
GitHub Copilot works by indexing the files in your repositories to provide relevant code completion suggestions. The challenge is that those files may include secrets or other sensitive data. To exclude files that you know contain sensitive data from being indexed by Copilot, you can use the Content Exclusions feature. It works by allowing you to specify paths to excluded content in the settings for your repo or organization. When you specify these paths, it means that:
The content of those files won’t be indexed and referenced in AI-generated code completion suggestions
Code completion won’t be available in those specified files
Read more about how the Content Exclusions feature works here.
Secrets can be used in GitHub Actions workflows, but only if you explicitly include the secret in a workflow by setting it as an input or environment variable in the workflow file. Below are some tips for making sure that your secrets aren’t leaked through your workflows.
Avoid using structured data as the values of secrets. To make sure that GitHub redacts your secrets in logs, you should avoid creating secrets that contain JSON or encoded Git blobs. Since GitHub Actions redacts the complete, literal token, if you use only a part of that structured data, GitHub will not be able to redact it. For example, if you put a JSON array of tokens into a secret, and you use the first token, GitHub Actions would not redact it.
Limit credential permissions. Consider using deploy keys or a service account instead of personal credentials. Use a GitHub App instead of a personal access token: GitHub apps, like personal access tokens, use fine-grained permissions and short-lived tokens, but critically, they aren’t tied to a user. If you do have to use a personal access token, grant read-only permissions if possible.
Avoid passing secrets between processes from the command line. These processes may be visible to other users or captured by security audits. Instead, use environment variables if possible.
Delete workflow run logs: GitHub automatically redacts secrets printed to workflow logs (as long as the secrets’ values don’t use structured data, as noted above). But runners can only delete secrets that they can access, so a secret will only be redacted if it’s used within a job. If you want to make sure that secrets aren’t leaked, you can delete your workflow run logs.
Want more protection against secrets leakage? Minder Cloud can help you apply and auto-remediate policies across your GitHub repositories to make sure secret scanning and secret push protection are always enabled at the repository level (even if a user disables it).
Minder Cloud is free for public repositories. Try it out now at cloud.stacklok.com.