Research & Writings
Summaries of my academic work, long-form articles, tutorials, and miscellaneous notes. Filterable by topic.
Summaries of my academic work, long-form articles, tutorials, and miscellaneous notes. Filterable by topic.
Quite a few people are amused by the perceived irony of Silicon Valley engineers setting out to create an all-powerful digital god (AGI) which so far only seems to automate software engineering, i.e. AI will leave software engineers (SWEs) that created it out of jobs first. But will it?
Being a good coder is not about knowing a programming language or a framework, it’s a state of mind, an ethos, a way of life, really. It’s a general attitude that you can build and change things, a tendency to default to an optimistic outlook. It’s also a culture of working towards clear, objective, measurable outcomes with clear metrics of progress. Breaking down a huge problem into chunks of well-scoped tasks undertaken one pull request at a time. The concept of having artificially imposed guardrails like unit tests, contract tests, integration tests, regression tests, type checks, strict linters. All of this might sound so trivial to any coder: “boy, are you really writing an ode to commits and GitHub workflows?” To which I say: you have no idea how foreign these concepts are to pretty much everyone outside of CS.
It is frequently “discovered” that upwards of 80% of peer-reviewed research in chemistry or biology is actually non-reproducible. These numbers usually shock tech people, but let me rephrase this in a language you’d understand: 80% of codebases that were written without a single refactor contain a critical bug. Any seasoned coder knows from experience that no matter how diligent you are in planning, as you build out a codebase, eventually it gets to the point of being a slight mess that can and should be reorganized. In the process, you often revisit assumptions or even discover latent bugs. There’s no such thing as finished software; there’s always more to change, implement or fix. No matter how great your team is, when you release some major new version, e.g. v1.0.0, you’ll almost immediately find a reason (maybe even within a day) to release v1.0.1. And yet, in experimental science a publication might report (or even be written around) an experiment that was performed just once.
And of course, there’s a factor of real or apparenti.e. a process is sensitive to variables you don’t or even can’t control non-determinism of experiments, especially if the experiments involve living cells, which explains a part of the 80% figure, but I think it accounts for much less than experimental scientists would think (or want to think). And so let’s say AI wipes out 90% (or even a 100%) of current SWE jobs, what’s going to happen when these engineers would start migrating to physical and life sciences jobs?which are currently considered to not have (easily) verifiable rewards, and so, almost proudly, immune to RLVR, but read next voice How much comparative advantage would a guy who is accustomed to iteration and the culture of self-imposing sanity checks (like type checks and tests) have?
Here’s how I think about this. Imagine the dumb mistakes any developer can make, and regularly makes, regardless of experience, that should be caught by CI/CD. For example, you start an expensive training run, spend a few hours of GPU on one epoch, and then your code suddenly crashes because of some small mistake in your validation loop or checkpointing logic. I’d bet every single person who has ever trained a neural network has made this mistake. And through this tragedy at small scale you discover the justification for simple smoke tests: let’s just call each critical function and let’s make sure it runs to completion. And then, at some point in the future, you’ll make some change in your code, and no matter how diligently you’ll try to make sure to think of every downstream function you need to update, chances are you’ll open a PR and get a failing test, read, and think “Oh, yeah, I forgot about X” or “oh, yeah, there’s this not exactly obvious second-order effect.”
My argument is quite simple. You either have to believe that experimental scientists are fundamentally incapable of making such mistakes, maybe because they’re biologically built different or they have such robust training programs that it is unthinkable for them not to measure twice, OR it should be obvious that the lack of comparable guardrails like CI in experimental science is slowing everyone down.
An experimental scientist might be eager to deflect by arguing that things like linting, type checking, testing, versioning (branching, commits, PRs) are unique to coding and are not easily transferable, but how hard did anyone search for the parallels? I’m not ready to make a judgment as to whether some discipline (e.g. experimental chemistry or biology) is epistemically more difficult than another (e.g. coding). But what I am willing to say is that maybe all disciplines are epistemically more similar than we are currently ready to admit.
Even though agentic AI is already capable enough of being transformative both at the scale of individual lives and entire disciplines, it’s not going to unlock these transformations without a cultural evolution.
Let’s go back to the example with scheduling arbitration. Without starting a normative rant, I’ll make a few descriptive statements: I was not asked if I could re-run the same script for other disciplines, nor I was asked to prepare a handoff of technology so that it can be reused in the following years. If olympiad organization was a GitHub repository, we’d immediately create an issue “Integrate the pilot scheduling logic into the arbitration module” and be put on a roadmap for the next year. But it wasn’t.
And to be precise, I’m not saying everything should be a Git repo (although many more things should be), but ask yourself how many initiatives would benefit from:
If you’re an active AI user, you’d know that the barrier for reorganizing your work in a way to allow you to reap these benefits (either by creating custom web apps or by adjusting day-to-day workflows so that they’re suitable for existing software) is already zero. But if we take any lessons from the 30 years of breaking barriers with internet (think online courses, technical blog posts, educational videos), we have a systematic tendency to overestimate the role of barriers to activity and underestimate the importance of intent, urgency, the desire to be in the arena to strive and achieve. Leaving aside whether those tendencies are innate to every human being or learned, they definitely are utilized in different disciplines to a varying extent, and coding is one of the tasks that requires them on some deep fundamental level.
A natural conclusion is that even though demand for people who actually write code might decrease, we still need people trained to write code because in the process they’re trained to think like SWEs, and we’ll need way, way more people who can think like SWEs. Now, I understand it might be very easy for people who are familiar with the tech world in practice to challenge this take because an average graduate of a CS program is not necessarily someone you’d trust with making consequential decisions, but there are two traps I want to highlight:
First, tech peopledespite the emergence of accelerationism movement, which is in some ways reactionary to the status quo tend to be way more skepticaland so, in Popperian sense, they might even be the only true scientists left today about the viability of any particular solution to any problem. For whatever reason, there’s a very real and natural tendency of deeply competent tech people to significantly underestimate their competence (even if relative to the rest of the labor force) and ability to impact changeI didn’t understand the extent to which tech people tend to spiral until I saw the contrast between Yale and MIT
Second, there’s little point in comparing the skills, competence, ability to create economic value or effect change by looking at the representatives of the lower quartile of any particular occupation, or even, one could argue, the median one. If you think an average coder is too dogmatic and too susceptible to group think (which would directly falsify my claim that SWE-style thinking is uniquely powerful), do you really think an average PhD chemist or biologist is any different? For any sufficiently large group of people, the in-group variance is too significant to the point of making inter group comparisons of medians almost futile.
And so the only question we should ask is whether a particular discipline or vocation is more likely to impose helpful and generalizable frameworks and skills to the top quartile/decile of whoever attempts to learn that discipline and undertake that vocation, and in this framing I think my take becomes much more reasonable.
other voices in this fugue
A crude script for scheduling olympiad arbitration shows how mundane software can carry absurd downstream stakes.
One reason bits have moved faster than atoms is that software cultures are unusually good at shortening the loop between idea and feedback.
A full translation of an interview with Grigori Perelman's math teacher. He explains Perelman's rejection of the Fields Medal as a protest against a 'dishonorable' math community that treats theorems as a commodity to be stolen. Also features a brutal, unapologetic defense of Soviet-era educational philosophy
Deriving the necessity of eternal punishment from the Prisoner's Dilemma. How infinite repeated games, discount factors, and the Folk Theorem explain the structural utility of Hell in fostering human cooperation