Vibing code quality

A lot of folk are trying out generative AI and sharing their experiences. It’s undeniably a cool technology that makes users feel awesome by granting them a superpower. Here’s a story about a time that happened to me.

Set up #

A few days ago, one of the people at work privately pointed out an error in a doc I wrote about logging. I had included an example where I showed the “correct” way to reraise an error with extra information, and it looked something like this:

try {
  doThing();
} catch (error) {
  throw new Error(`Richer information: ${error}`);
}

This is a problem because throwing the new error this way completely discards the original stack trace, which is a very bad thing, as then when something goes wrong in production, it becomes a lot harder to narrow down the cause. My colleague pointed out that there were many examples of this error in our code base, and our poor hygiene around rethrowing errors led to real problems.

Instead, the correct thing to do is to rethrow the error with a cause or to use a library like verror. This essentially attaches the stack trace from the original error to the new error.

I fixed the document so the example looked more like:

try {
  doThing();
} catch (error) {
  throw new Error(`Richer information: ${error}`, { cause: error });
}

Follow through #

But then I thought about it some more. I really should have known better. While my only TypeScript experience has been at this particular job, I should have been able to infer the need for something like this. After all, I’ve had to debug a lot of issues where the stack trace was the best guide to where the problem lay, and as a long-time Twisted developer, I’ve often had to deal with the frustration of partial stack traces.

If I should have known better, but didn’t and if there are many instances of this mistake in our code, then it was likely that many of our colleagues would also need to learn the same lesson. So I sat down and thought about how to solve that problem.

Fixing my doc was a good start, but it’s vanishingly unlikely that it was going to be read by the right people at the right time in order to change their behaviour. Writing a message on the right public channel on Slack might help a bit, but messages like that are so ephemeral. They are easily missed by people who are actually around, and completely invisible to people who join later.

Volumes more could be written about this, and many qualifiers added, but to get a group of people to behave differently, we needed to make it easy to do the right thing and difficult to do the wrong thing. I’ve found Rusty’s API Design Manifesto to be a helpful tool for thinking about encouraging good coding practices. JavaScript’s API for including causes in rethrown errors is at level 3 on Rusty’s API Design Manifesto: “read the documentation and you’ll get it right”. We would like to go up the stack in the manifesto, but our options are limited because the JavaScript Error API is out of our control. However, we could bump it up to level 4: “follow common convention and you’ll get it right”. The way to do this would have been to update every single rethrow¹ to include a cause, then, whenever we went to write our own rethrows, peer pressure and the power of copy/pasting would nudge us to do the right thing— the Galahad principle strikes again.

Except, if we take a closer look at Rusty’s API Design Manifesto again, we see there’s option 9: “the compiler/linker won’t let you get it wrong”. We’re JavaScript developers, and we don’t know what a linker even is! But we can weaponize our ignorance and pretend that it’s just a weird misspelling of the word linter! We can write a lint check that catches this kind of mistake and maybe even fixes it!

Twist #

I was fibbing. I didn’t sit down and think about how to solve the problem.

Here’s what actually happened. I went to Claude, told it about the rethrow mistake, and asked how we could fix existing mistakes of this kind and prevent future ones. It listed about ten options, of which one was “write an eslint plugin”.

Now, the last time I wrote a lint check was well before iPhones were invented². In Python. I had no idea how eslint checks worked, or whether this was a thing we could do. Had it been a couple of years ago, I would have stopped there. Maybe, if I were in the right state of mind, I would have taken a few hours to find all of the dodgy rethrows using rg and fix them by hand, and then found some doc to update somewhere, hoping for the best.

However, these days, robots can generate lint checks for us! You can vibe-code a coding guideline!

So I continued the conversation with Claude and asked it to write the plugin for me. It not only wrote the lint plugin, but it also provided tests and instructions for installing it, as well as the code for generating the corrected version for autofix. Super impressive. I’m such a badass.

There were still a few important decisions that had to be made though. From past experience, it’s better to fix everything at the same time that the lint rule is introduced, both to lower the burden on existing developers and also to actually start getting the benefit from the rule. So I did that, and the process discovered edge cases in how the plugin worked, which I then used, together with Claude, to add more test cases, while Claude took care of writing the code to actually fix those tests.

I also realised that it would be better for the autofixer to only ever be available if it was 100% sure of success. Otherwise, it would feel janky and unreliable to developers, and more selfishly, it would be a lot more hassle and effort to run autofix on the whole codebase if I had to go and check every instance.

Result #

Overall, the whole process from “learn about rethrow being a problem” to “PR with new lint check and a codebase where every rethrow has a cause” took about three hours. It would have taken days for me to do it without AI assistance, although as Mark Wotton put it when I mentioned this to him, “the counterfactual is often not somebody doing it by hand, it’s just not getting done at all”.

What value was I adding, over and above the code generation? When made aware of the problem, I looked for a scalable, systematic solution. When presented with options by Claude, I chose linting based on past experience and all the post hoc reasoning outlined above. After the code was generated, I knew I needed to iterate on a large real-world corpus, and directed lots of the corrections based on that. I curated the test cases to avoid duplication, to ensure they communicated intent, and to get rid of some slop. I made a judgement call that it was better for the autofix to be reliable, even if that meant it was more limited in scope.

Why does this matter? Like a lot of people, I think AI is going to radically change software engineering, although I don’t know how or to what extent. Things are changing so fast, and I am far from an expert. I do, however, need to pay the bills for the next few decades, so I need to try to keep up somehow. One way to do it is as I have described here: find an excuse and a safe space for experimentation, try things out, and then reflect on it afterwards. I’m not sure there are many better ways to learn anything.

At least in this instance, and in blissful ignorance of all the secondary consequences, it feels like I’ve gained a minor superpower.

I sort of wrote about this back in 2011, but I didn’t tease out just how common and normal and justified it is for coders to copy patterns from other parts of the codebase. ↩︎
I tried to find the commit, but the pyflakes Git repository doesn’t go back that far. I’m in the AUTHORS file though. ↩︎