How I've been using LLMs at work

This is a brief look back on how I’ve been using LLMs over the last year, some thoughts on how it has gone, and some notes on what I’d like to try in the near future.

Tech #

I basically only use Claude and Claude Code. I am sure I am missing out on other cool things, but these have been working well for me.

I should probably be using Projects within the Claude UI, but I never have. I have tonnes of conversations now that I’ll never need, never refer to ever again, and are a pain to clean up. I’ve started using the Incognito mode just to avoid having boring chats stick around.

Lazy Googling / StackOverflow #

A lot of my usage is things like “Remind me how to use a WINDOW function in BigQuery to calculate a 30d trailing average”, or “I’m getting this error, what does it mean?”. This isn’t much different in function from searching back in the olden days, except that the ad-free, cookie-banner-free, image-free text interface is lovely and calming and soothing and wonderful, and asking follow-up questions is easy.

Hallucination is a thing, but not that often, and if what I see doesn’t work or doesn’t pass a sniff test, I either ask for explicit references, or just find the references myself.

Understanding concepts #

It might just be me, or it might be the SRE role, but almost every tech I encounter is at least a little bit unfamiliar. There are different words for different concepts. One conceptual ’thing’ in one system might be one-and-a-half things in another system and it’s all just a bit exhausting.

LLMs have been very useful for helping me learn new systems or to translate my understanding of existing systems. For example, I can say something like “Explain how workload identity federation works. I’m particularly interested in going from GitHub to GCP. Include a Mermaid diagram”. Then when I see that, I can ask follow-ups like “Walk me through step by step from the workflow wanting to talk to GCP to it actually talking to GCP” or “Show me exactly where secrets get generated, and help me understand why this is actually secure”.

Or, switching to Claude Code, I can look at an actual codebase and ask similar questions like, “Walk me through exactly how the foo service is able to access the bar bucket. Show me receipts on actual code.” This will often surface new things I don’t understand, and I can then ask about those concepts.

Diving into unfamiliar code #

This dovetails with the “Understanding concepts” heading.

In the middle of the year, I got asked to implement a feature in a codebase I didn’t understand in a problem domain I knew nothing about. I literally didn’t even know where the code lived. However, I opened up a claude session in what I hoped was the right repository and started asking questions about the existing version of the feature, and then follow-up questions (“What’s the difference between a vulnerability and an advisory? How do they relate to each other?”, “Show me the flow of this information through all the systems”, “Which of these systems are deployed in our GCP project? Where’s the Terraform?” etc.) This made the whole thing much faster.

This is an interesting one for me, because I pride myself on being able to rapidly get to grips with an unfamiliar codebase. When a computer is better at that than I am, it at least gives me pause for thought. In this case, I wasn’t satisfied until I had shoved the AI-generated understanding through my own neurons by checking up on the code, reading it through myself, and writing down the understanding in my own words.

Helping with incidents #

Similar to the above heading, I’ve had to deal with one or two incidents in systems and code that are almost completely unfamiliar to me. I’ve used claude to explain the flow of traffic and requests by using both the source code & IAC, but also by using gcloud to probe what’s actually there.

In the most recent incident I handled, it was able to pinpoint the cause of the problem (locks on INSERT operations causing us to hit our max open DB connections). I had to steer it though, as sometimes it confidently asserted that some other system was at fault. Rather than explicitly correcting it, I’d ask “Perhaps. But how can that be the case when X? Show your working.”. I’m wary of even gentle correction being taken as gospel truth, so I’ve been leaning on questions and requests for evidence. “Show receipts” or “Explain it in a way that will convince a sceptical fellow engineer” comes up a lot. The sceptical engineer is often me.

Building one-off tools #

There have been a couple of times when I’ve needed to do a major refactoring or global analysis of Terraform code. Instead of getting claude to do that for me, I’ve got claude to build a Go tool using the tfconfig library.

That way, I can review the tool rather than the changes or analysis, which gives me more confidence that it’s doing the right thing. I think it’s also a more efficient use of tokens, but happily I haven’t had to pay much attention to that.

Building highly situated tools #

I have almost entirely vibecoded git-worktree-manager, which I affectionately call gwm (pronounced to rhyme with “doom”, as if it were a Welsh word). It’s been great. I don’t particularly want anyone else to use it, and I’m glad I’ve got it.

Translating code #

One of the occupational hazards of being an SRE is that you have to read and debug shell scripts. For me, once a script gets over about 20-30 lines I start wishing it was in an actual programming language with proper functions, tests, type checking, and so on. Or even Go.

It has been wonderfully relaxing to be able to say “Rewrite this shell script as a Go command-line tool” and just get the right result, moving a multi-hour job into a multi-minute job.

Writing and refactoring code #

I have a definite style of writing code that I have yet to properly articulate. It’s something along the lines of ‘functional core, imperative shell’, and ‘interfaces are great’, and ‘pass in the things you need’, and ‘separate I/O from computation’, and ’test along system boundaries’, and ‘don’t write tests that prove you can write the same code twice’, and ‘avoid or contain mutability’. If you watch Fackler and Manista or Sandi Metz or Gary Bernhardt’s stuff you’ll get an idea. I should try to write some posts on this.

Because I haven’t been primarily working as a coder, I’ve yet to really try to push this style into what Claude makes for me. However, I have definitely used claude to do individual refactorings along these lines. Often this has been iterating on other work that I’ve been doing.

Or, I’ll notice an opportunity for improvement while I’m working on something, but know that it will distract from the current change, so I’ll get it to output a plan that I can use in another session.

Emacs #

I’m a life-long Emacs user. I wouldn’t necessarily recommend it to anyone else, but I’ve developed a symbiotic relationship, and ripping the tendrils out from my brain would do irreparable damage.

I’ve been able to use claude in my dotfiles repo to remove warnings, to correct bugs, or even to say things like “I’ve been iterating on this Emacs config since 1998. Point out opportunities for modernizing it”, or “remind me exactly how and when use-package downloads things from the internet”.

Quality of life improvements that I would have put off in the past have become easy to do. Further, they have been easy to do in the background as tasks parallel to the main tasks I’m supposed to be doing.

Data exploration and engineering #

We use BigQuery at work and BigQuery has a pretty good command-line tool, bq. I’ve often opened up a Claude Code session in the repo with all the table definitions and just asked it questions about the data, or got it to write a new view on the data. It’s great. I have to remind it to avoid joins, and I haven’t got a good system for avoiding accidental expensive queries, but it’s a good start and I never have to remember how to write a WINDOW function again.

One anecdote might help. We had a surprisingly expensive BigQuery bill from a job that we couldn’t attribute because it didn’t have labels. I manually explored the job records using INFORMATION_SCHEMA to find the offending query and the service account that was running it. That was enough to let me find the repository with the query. I opened up claude and asked it to explain how often this query was running, why it was so expensive (was it running more frequently than before or had the underlying data changed, etc.), and make a plan for fixing it. It correctly found that the query was doing a full table scan once for each row in the table, making it quadratically expensive. We went back and forth on a plan to fix it, and got it down to something that’s so cheap we’ll never have to think about it again.

Designs #

Broadly speaking, almost every Claude Code session that I do starts in plan mode. I might start with a prompt like, “The foo system is sending custom alerts to Slack. I’m thinking of sending them to incident.io instead, because I want to route them directly to teams. What would it take to do that? What are the trade-offs? How can I ensure a smooth migration?”. I then iterate on the plan. When I’m happy, I either tell it to go, or chat through the design with a colleague.

For bigger deals, I’ve given it our design doc template and then uploaded the results to Google Drive. I’m not always happy with the writing (see below), but I love not having to draw interaction diagrams myself.

Quite commonly when I set out to do something I’ll have two or three ideas on how to do it and not have a sense of which one is best. In this case, I’ll start two or three parallel sessions of claude and give one idea to each session. This quickly surfaces the winning idea, or reveals why all of my ideas suck and I need to go talk to a human. Both outcomes are wins.

Writing #

I think I’m a good writer, particularly in a corporate context. I value the process of writing, as it shows me where my thinking is ambiguous, fuzzy, or just flat out missing. I also see it as an intensely human activity. I am writing for someone, either to help them do something specific, or to get them to help me.

I’ve generally been dissatisfied with the quality of writing generated by LLMs. For reasons I don’t understand, it makes my eyes glaze over, and I cannot help but feel that the human who instructed the LLM to generate the prose was more interested in performing documentation than they were in meeting the mind of a fellow human.

However, I’ve still managed to get a fair bit of value.

At least once I’ve had an idea for a document or a policy or whatever where I’ve had a dozen bullet points that I’ve noted over the course of a couple of weeks. The bullet points were important and took a fair bit of thinking and context to develop, knowing what to include and what to emphasise. However, they weren’t any good to man or beast, because they were entirely in internal jml-speak. There was definitely a time when the overall structure of a genuinely useful doc was clear in my head, but life and circumstances had driven it out. Being able to paste out the bullet points and say “turn this into a doc aimed at engineers already familiar with X to help them know what to do when Y” massively helped me, because I then had a whole draft with an actual structure that I was able to edit down into something usable.

More often though, I’ve used it to review existing documents. Sometimes it’s as an editor. I’ve used LLMs as a sensitivity reader on a blog post. I didn’t take all its advice, but it prompted me to think about things I would otherwise have ignored. Other times, I’ve asked it to find ambiguities and hidden assumptions in my own writing. Other times, I’ve had it mine other documents for decisions made.

All of this has still required follow up editing, reading, or revision from me. Sometimes, seeing an obviously wrong document has made it much easier for me to write the correct document. Even so, it’s been a help.

Career #

I try to keep weekly snippets of what I have done, what I’ve actually achieved, what problems I’ve faced, what I hope to do next week, and how my current week fared against my plans. Often I do this by looking at my merged PRs or closed issues. It’s been nice to be able to get an AI to turn that JSON into text, which I then edit.

Then, when the quarter ends, I can get Claude to go through my snippets and assemble something suitable for self review. This last quarter, I asked it to analyze what I’d done in reference to Chainguard’s engineering principles. Some of it was tosh, but a lot of it wasn’t. It made the whole process much easier.

Reflections #

The claude UI is something of a sweet spot for me. I love being able to orchestrate command-line tools and file operations without having to consult references or remember which flags mean what. Growing up with IRC means that text chat is built in to how I use computers. Years of doing code reviews and managing have helped me get better at writing down what I want.

It’s so important to drive whatever the LLM does through your own neurons, at least for work that you are accountable or responsible for. I think that means actively doing something rather than passively consuming. For me, this is writing, talking to someone else, coding, or even interactively browsing in an editor and jotting down comments and questions.

I occasionally get FOMO when I see what other people at work are doing with AI. I don’t really know what Cursor is. I’m honestly better at meaningfully using the word “synergy” in a sentence than I am at using the word “agentic”. I always have to look up the correct order for the letters in RLHF. I want to get better at this. I feel that I’ve stopped a bit too early in the explore/exploit trade-off. Yet even with my currently knowledge, I’m getting more done to higher quality and having fun while doing it.

I have definitely made mistakes, and submitted code for review that I hadn’t really bothered to look at or understanding. However, the very fact that I’m able to say this happened shows that the system works!

Trying multiple ideas in parallel, or working on parallel streams of work has been great. Sometimes, it’s too much, and I need to whole-arse one thing rather than half-arse two things, but other times it is exactly what I need.

There have been some days where I’ve been low energy or mildly unwell or having trouble focusing and I’ve been able to use Claude to redeem the day by getting something done. The way I can just resume a session or have it write out a summary that I can use in a later session has been great for this.

Plans #

I want to dig into the claude tool more, and better understand hooks, skills, permissions, tools, sub-agents, and so forth, as well as to tune CLAUDE.md files.

We have a great, judgement-free channel on our Slack at work that people use to share ideas and practices around AI usage. I’d love to build on this, stealing the ideas from Will Larson’s post on company AI adoption. Realistically this is so far out of my wheelhouse that what I really want is someone else to do it.

I want to expand the range of tools or systems that are in the loop in a Claude Code session. I’d love to link in our issue tracker, or Slack, or Google Drive.

Conclusion #

I hope this helps!

One of the things I’ve found most helpful when learning is when people have suggested something to try that I hadn’t even thought of. One reason I’ve written all this is in the hope of sharing that experience with others.

Also, while writing it I’ve been surprised at how much I’ve been using LLMs and how much I’ve learned along the way. I know it’s not risk-free and I know there are real concerns about the overall situation and many, many of the specifics, but I am having a lot of fun. I hope you do too, and that we sort all the rest out.