Research & Writings
Summaries of my academic work, long-form articles, tutorials, and miscellaneous notes. Filterable by topic.
Summaries of my academic work, long-form articles, tutorials, and miscellaneous notes. Filterable by topic.
Let’s start with the TanStack supply chain attack, because its possibility immediately converts any workflow that installs fresh packages from a public registry into a radioactive liability for any organization. Which basically includes every single python developer outside of a very narrow fraction that uses best practices like lockfiles and package managers (like uv) that support them.
An attacker managed to publish 84 malicious versions across 42 @tanstack/* packages to the official npm registry. The compromised packages had an optionalDependencies entry pointing at an orphaned commit in the TanStack/router fork. Installing the package fetched that commit, ran its prepare lifecycle script, installed Bun, and executed a large obfuscated router_init.js payload smuggled into the compromised tarball. That script harvested common cloud credentials, GitHub tokens, SSH keys, npm tokens, and then enumerated other packages the victim maintained and republished them with the same injection.
If that wasn’t enough, the broader mini-Shai-Hulud malware family also had dead-man’s-switch behavior: it monitored stolen GitHub-token validity, and if the token stopped working, the payload attempted to wipe the user’s home directory.
See the original report from May 11 and the subsequent official post-mortem, and TanStack’s follow-up on what they changed afterward
If such an attack happened a year ago, the most likely cause would’ve been that some TanStack maintainer with publishing token/release access was personally pwned (as a result of, say, a weak password reused across different platforms with no 2FA enabled).
What’s mind boggling about this case is that no maintainer was compromised and no one had to click merge, approve, or even review the malicious PR. Opening a PR was enough. The attack worked by threading together several CI behaviors that are each defensible in isolation, but unsafe in combination: pull_request_targetpull_request_target
exists to enable CI to run with access to base-repository metadata even for outside contributors. Though pull_request_target has been an object of criticism for awhile. for outside contributors, cache reuse across PRs coming from trusted and untrusted contributorscaching is not some exotic footgun, it’s a completely ordinary optimization because otherwise CI gets slower and more expensive very quickly., and a later privileged release job that consumed the poisoned stateautomated publishing from a trusted merge to
main
is also a normal and in many ways desirable setup; the whole point is to reduce manual release toil and long-lived credentials. and published from a legitimate merge to main.
How do you get affected by such a supply chain attack? In either of the following cases:
In the JavaScript ecosystem it is a widely accepted practice to almost never install packages in either of these steps by taking all the latest versions from the registry because that might lead to all sorts of unexpected reproducibility issues, and instead rely on lockfiles that specify a particular version.
As a result, this attack only affected those who didn’t use lockfiles, which (with the exception of a skill issue) means if you were trying to create a new project.
In python-land, however, unless you’re using uv (or poetry) as your package manager, you’re fetching from the registry every single time you install your packages.
Concretely, this is the difference between pip install -r requirements.txt and uv sync --locked. Using uv, is no longer just about good practices and reproducibility, it’s now, fundamentally, about security.
uv also allows you to specify a cooldown period which any new package has to be published for before it can be installed in your environment. I describe how in my second voice on my new practices
This fugue could’ve been written without mentioning CopyFail, but it is so mind-boggling because it is an exploit that anyone can reproduce in a few seconds.
You can explore the original disclosure yourself, but the basic idea is that a 10-line Python script allows any user with non-root privileges on pretty much every linux distribution to grant root privileges to themselves.
Its mechanism is a rabbit hole, but here’s my understanding. If you write code in a high-level language like Python, you don’t have to worry about managing memory. You can just write:
def get_some_insights(file_path: Path) -> dict[str, Any]:
text = file_path.read_text()
return summarize(text)Whenever you operate in lower-level languages like C, you have to explicitly pass a memory buffer to which the result must be written:
int get_some_insights(const char *path, char *out, size_t out_len) {
char tmp[8];
if (out_len < 16) {
return -1;
}
...
}when you write code for core utilities, you have to be quite mindful of your memory usage, in other words you don’t have the luxury of defining helper variables or flags like you’d do in Python without even thinking twice. It’s not unusual for C code to use some of that memory buffer as a scratch area.
memcpy(tmp, out, 8);
memcpy(out + 4, tmp, 4);
memcpy(out + 16, tmp + 4, 4);And in CopyFail, a core linux utility authencesn was writing extra 4 bytes beyond the end of its output buffer. Attackers found a way to take advantage of this bug to write arbitrary data (like a malicious payload) into an in-memory cache of a /usr/bin/su file without ever changing the underlying file. That page cache was then utilized to execute malicious code with root privileges.
Sure, you could argue authencesn maintainers made a mistake in overusing the buffer, but they probably couldn’t have imagined in their scariest dreams that someone would be able to make it so that overused stuff ends up in page cache of a /usr/bin/su file.
The point of all of this is that no matter how insanely improbable an exploit could sound in theory, it is now possible. Which is why, for the foreseeable future, it is justified to treat everything on the internet as potentially compromised.
other voices in this fugue
Most python developers became radioactive liabilities overnight
AI trained to code and became an expert hacker. When we train it to cure disease, it will become a chemical weapons designer. This cybersec challenge is our society's only test run for when the stakes are biological
A full translation of an interview with Grigori Perelman's math teacher. He explains Perelman's rejection of the Fields Medal as a protest against a 'dishonorable' math community that treats theorems as a commodity to be stolen. Also features a brutal, unapologetic defense of Soviet-era educational philosophy
Deriving the necessity of eternal punishment from the Prisoner's Dilemma. How infinite repeated games, discount factors, and the Folk Theorem explain the structural utility of Hell in fostering human cooperation