Blog

Silent but deadly: Using Minder to detect and prevent homoglyph attacks on your code

Meet Minder’s new features: two rule types aimed at guarding against homoglyph attacks. Learn what homoglyph attacks are, and why you’ve never seen one before.

Author: Teodor Yanev
/
5 mins read
/
Feb 28, 2024
Cybersecurity
Minder

If you reviewed a patch like this, would you be concerned about the security vulnerability introduced?

JavaScript
function sanitizeUsername(username) {
    // Only allow DNS-type names
    disallowed = /[^-a-z0-9]/;
    usernаme = username.replace(disallowed, ‘-’);
    return username;
}

If you didn’t spot the vulnerability, would you be surprised to learn that this function does not remove any characters from usernames?  What you don’t see is that the а in the assignment to usernаme uses a unicode character, U+0430, also known as CYRILLIC SMALL LETTER A.  This means that the replacement assigns to an otherwise-unused variable, and then the input is returned untouched! 

A homoglyph refers to characters that appear visually similar or identical to each other, but are represented by different Unicode code points (for example, CYRILLIC SMALL LETTER A (U+0430) and LATIN SMALL LETTER A (U+0061).  Many programming languages support internationalized characters for variable names to allow programmers from all over the world to express themselves using their native language.  Unfortunately, malicious actors can attack projects using these homoglyphs: 

  • By introducing invisible characters into lines of code

  • By mixing scripts in code, like using a visually similar character from a different script to camouflage malicious intent 

  • By including bidirectional Unicode text, which can be compiled differently than it appears in a code review (see the vulnerability CVE-2021-42574

Organizations like GitHub, Red Hat, and Rust have all taken steps to proactively mitigate certain homoglyph attacks, but the threat still exists. Despite these advancements, other vulnerabilities remain unaddressed, presenting opportunities for crafty attackers. Among these unaddressed issues is the attack presented at the beginning of this post, assigned CVE-2021-42694. While we don’t want to undo the progress in making programming more accessible to all backgrounds over the last 20 years, the Minder team aims to implement thoughtful protections, focusing on these attacks without limiting everyone to US ASCII.

Minder is introducing new rule types to analyze your pull requests for homoglyph attacks, providing a proactive defense against these threats. These rules, which detect the use of invisible characters and mixed scripts, can be used to catch homoglyph attacks before they’re introduced into your codebase. If you are interested in the complexities of such vulnerabilities, we recommend checking this academic paper titled Trojan Source: Invisible Vulnerabilities, published by a group of researchers at the University of Cambridge.

Below, we’ll explore why invisible characters and mixed scripts are threats, and how Minder can help.

Invisible Characters: You Won’t See This One Coming

Invisible characters, while seemingly harmless, present a sophisticated vector for security threats within software development. These characters can disrupt the logical execution of code, creating a chasm between expected and actual behavior. This subtle form of attack exploits the rendering behavior of Unicode characters and can be abused to create security vulnerabilities. These vulnerabilities are adept at evading even the most thorough code reviews.

Example:

Consider the JavaScript function getUserAccessLevel, which seemingly assigns a "basic" access level to every user. However, embedding an invisible character within the return value allows an adversary to subvert logical comparisons, as shown in the function assignAccessLevel. This clever manipulation ensures that the returned access level string is never strictly equal to "basic,” thereby granting elevated privileges without raising suspicion.

(Tip: You can use a Unicode decoder tool like this one from Babelstone to identify invisible characters.)

JavaScript
function getUserAccessLevel(username) {
    // Returns "basic" access with an "invisible plus" character (U+2064) appended.
    return "basic⁤";
}

function assignAccessLevel(access_level) {
    if (access_level === "basic") {
        console.log("You are not an admin.");
    } else {
        console.log("You are an admin.");
    }
}

let username = "user";
let access_level = getUserAccessLevel(username);
assignAccessLevel(access_level);

Mixed Scripts: The Deceptive Simplicity of Homoglyph Attacks

Homoglyph attacks make deceptive use of mixed scripts, utilizing characters that are visually similar across different scripts to camouflage malicious intent within code. These attacks can craft functions or variables that appear legitimate, but redirect program execution toward shady ends, a concern elaborated in both the Trojan Source paper above and in the Unicode Consortium's report on Mixed Script Detection

Example:

A seemingly harmless function, secureHashPassword in Node.js, might be compromised by the introduction of a homoglyph in the variable name. The Cyrillic 'а' (U+0430) replaces the Latin 'a' (U+0061), allowing the function to execute without error, but failing to return the expected hashed password. Instead, the original password might be leaked, demonstrating a clear and present danger in the handling of sensitive information.

JavaScript
const crypto = require('crypto');

function secureHashPassword(password) {
    const salt = crypto.randomBytes(16).toString('hex');
    // The variable below has a Cyrillic 'а', making it a homoglyph.
    pаssword = crypto.pbkdf2Sync(password, salt, 1000, 64, `sha512`).toString(`hex`);
    // Returns the original password instead of the hashed one.
    return password;
}

let userPassword = "mySecurePassword";
let hashedPassword = secureHashPassword(userPassword);

console.log(`Original password: ${userPassword}`);
console.log(`Hashed password: ${hashedPassword}`);

How can Minder help detect and prevent homoglyph attacks?

Minder has added two new rule types designed to automatically detect these attacks in your code, and comment on PRs when they are detected. These rules are integrated into a dedicated profile aimed at defending against homoglyph attacks, but you can also incorporate them into your own profiles:

Yaml
version: v1
type: profile
name: homoglyphs-github-profile
context:
  provider: github
alert: "off"
remediate: "off"
pull_request:
  - type: invisible_characters_check
    params: {}
    def: {}
  - type: mixed_scripts_check
    params: {}
    def: {}

Our first rule tackles invisible characters, those elusive elements that attempt to infiltrate your code undetected.

Let’s assume we have this potentially dangerous line as part of our pull request change patch and we’ve activated the “homoglyphs” profile to guard our repo:

Image of mixed script with invisible characters

This rule type looks as follows:

Yaml
version: v1
type: rule-type
name: invisible_characters_check
context:
  provider: github
description: |  
  For every pull request submitted to a repository, this rule will
  check if the pull request adds a new change patch with invisible characters. 
  If it does, the rule will fail and the pull request will be commented on.
guidance: |
  Detects and highlights the use of invisible characters 
  that could potentially hide malicious code.
  
  The characters classified as "invisible" can be found at
  https://invisible-characters.com/
  
  For more information on the potential security implications, see
  https://www.usenix.org/system/files/usenixsecurity23-boucher.pdf
def:
  in_entity: pull_request
  param_schema:
    properties: {}
  rule_schema: {}
  ingest:
    type: diff
    diff:
      type: full
  eval:
    type: homoglyphs
    homoglyphs:
      type: invisible_characters

Minder's approach is pretty straightforward: it highlights the potentially malicious segments of the code and annotates directly on the implicated line as follows:

Image of invisible characters found in code

To detect these, we employ a straightforward in-memory database, mapping content elements that raise red flags as potential issues. For full transparency, we source the Unicode points for 'invisible characters' from https://invisible-characters.com.

Next, let's delve into our second rule, pinpointing characters that look similar yet differ in their underlying code.

Yaml
version: v1
type: rule-type
name: mixed_scripts_check
context:
  provider: github
description: |
  For every pull request submitted to a repository, this rule will
  check if the pull request adds a new change patch
  that contains mixed scripts. 
  If it does, the rule will fail and the pull request will be commented on.
guidance: |
  Detects and highlights the use of strings with mixed scripts 
  that could potentially hide malicious code.
  
  For more information, see
  https://unicode.org/reports/tr39/#Mixed_Script_Detection
  and
  https://www.usenix.org/system/files/usenixsecurity23-boucher.pdf
def:
  in_entity: pull_request
  param_schema:
    properties: {}
  rule_schema: {}
  ingest:
    type: diff
    diff:
      type: full
  eval:
    type: homoglyphs
    homoglyphs:
      type: mixed_scripts

Like the previous instance, Minder annotates lines containing potentially harmful strings with mixed scripts, clarifying the specific strings and scripts involved. It’s important to note that Minder disregards the 'Common' script as inconsequential, omitting its presence in strings.

Image of mixed scripts detected in code

For this functionality, we again utilize a simple in-memory database, correlating each Unicode character with its script category. This database is compiled through careful manipulation of the Unicode Character Database, ensuring comprehensive coverage by explicitly generating any absent characters.

By enabling the "homoglyphs" profile, you effectively enlist Minder's vigilant duo to safeguard your code. This feature ensures continuous protection against these sophisticated threats, maintaining the integrity of your GitHub repositories. 🎉🛡️

Teodor Yanev

Software Engineer

Teodor is a software engineer at Stacklok. He is based in Bulgaria.