What agentic coding actually looks like

April 11, 2026·15 min readAI

Three terminal panes. Each one running a Claude Code instance. One is writing integration tests for a new API endpoint. Another is implementing the endpoint itself, following patterns from the existing codebase. The third is updating the OpenAPI spec and generating client types.

I have not typed a line of code in twenty minutes. The feature is taking shape.

This is the moment I stopped thinking about AI coding assistants as fancy autocomplete. I was not writing code anymore. I was directing work. Reviewing output, adjusting course, catching mistakes before they compounded. The role shifted, and it happened so gradually that it took stepping back to notice.

That shift is what I want to talk about. Not the tools themselves (I have compared those before) but what changed in how I actually spend my time as a developer.

The old loop

For years, the process looked the same. Read the code. Think about what to change. Write the change. Run the tests. Debug what broke. Repeat. Every feature, every bug fix, every refactor followed this loop. Your hands were on the keyboard for most of it.

The speed bottleneck was never typing. It was context-loading. Reading through a module to understand how it worked, tracing a function call across three files, remembering which edge cases the existing tests covered. The actual writing was the easy part. Getting to the point where you knew what to write took the real time.

When Copilot showed up, it felt like a real shift. And it was, but a smaller one than people claimed. Copilot was really good autocomplete. It predicted what you were about to type and got it right often enough to be useful. But it did not change the fundamental process. You still drove everything line by line. You still held the full context in your head. You still decided what to write and where.

The AI filled in what you were already thinking. It saved keystrokes, not cognitive effort. I typed faster, but I was still doing the same job. The loop was the same loop. I was still the one driving.

The IDE agent era

The real shift started when coding assistants became agentic. Not just suggesting the next line, but reading files, making changes, running commands, iterating on results. Agents that could do multi-step work.

Cline was my first taste of this. It showed up in late 2024 as a VS Code extension originally called Claude Dev. You could give it a task, and it would read your project files, propose changes, execute terminal commands, and loop on the result. The first time I watched it scaffold a complete feature by navigating through my codebase on its own, it felt like magic. It found the relevant files without me pointing them out, understood the patterns, and produced working code. I did not have to think about the implementation step by step. I described the outcome, and the agent worked backward from there.

But the magic had friction. Every action needed approval through the VS Code UI. Read a file? Click approve. Run a command? Click approve. Make a change? Click approve, then review the diff in the IDE panel. For a task that required reading ten files, making changes to four, and running tests twice, that is a lot of clicking. Cline's Plan/Act mode split was clever. Plan mode let the agent think through an approach before doing anything, and Act mode executed. But even with that separation, working through Cline felt like talking through a translator. The intent was getting through, just slowly.

The bigger limitation: one agent at a time, in one VS Code window. If I wanted to work on two parts of a task in parallel, I could not.

Roo Code came next in 2025, forked from Cline. It added role-based modes: Architect for planning, Code for implementation, Debug for troubleshooting, Ask for exploration. The diff-based editing was faster than Cline's full-file rewrites. Each mode focused the agent's behavior, which meant less wasted context and better results for specific tasks. The Architect mode in particular was useful. You could have it analyze a problem and propose an approach without touching any code, then switch to Code mode to execute the plan.

Better. But still IDE-bound. Still one agent at a time. And the modes, while useful, added another layer of indirection between me and the work.

Kilo Code followed the same trajectory. More polish, better budget tracking, MCP support out of the box. By the time I tried it in mid-2025, the pattern was clear. Each new IDE agent added features on top of the same architecture. They all shared the same fundamental constraint. The features were different. The ceiling was the same.

The IDE was the ceiling. Not the models, not the prompts. The IDE itself limited how you could work with agents. One window, one agent, one task at a time. The abstractions these tools added (approval dialogs, diff viewers, mode selectors) made sense for safety, but they slowed everything down. The visual feedback was reassuring. It was also a bottleneck.

The terminal unlock

Moving to CLI-based agents removed the ceiling. I started using Claude Code for personal projects, then picked up OpenCode at work. The difference was immediate.

No IDE abstraction layer. The agent talks to your filesystem and terminal directly. No approval dialogs for reading files. No visual diff panels slowing down the edit-run cycle. You describe what you want, and the agent does it. If it needs to read ten files to understand the context, it just reads them. If it needs to run your test suite, it runs it and reads the output. The feedback loop tightened dramatically.

But the real unlock was parallel execution. Split your terminal (I use tmux), and you can run multiple agent instances simultaneously. One agent writes tests while another implements the feature. A third updates documentation. They are working in the same codebase, on the same branch, and as long as you coordinate which files each one touches, they do not conflict.

This was not a minor improvement. It changed the throughput of a single developer. Work that used to be sequential became parallel. I stopped thinking about tasks as a queue and started thinking about them as a set that could be distributed.

The composability matters too. Pipe output from one tool into the agent. Chain agent output with other command-line tools. Feed a git diff into a review prompt. Pass error logs in and get back a diagnosis. This is the Unix philosophy applied to AI. Small, focused tools that combine well. I wrote about this pattern in my AI code review post, where I pipe staged diffs directly into a review prompt. That same idea extends to any data you want an agent to analyze.

CLI agents feel like going from a graphical FTP client to the command line. Less pretty. More control. Faster once you build the muscle memory. I missed the visual diffs for about a week. Then I stopped noticing.

What actually works

After months of daily use, certain patterns have proven reliable.

Scaffolding. "Create a new API endpoint following the pattern in /routes/users.ts." I hand the agent an example of an existing, well-structured piece of code, and it produces a new one that follows the same conventions. File structure, naming, error handling, types. The agent picks up on patterns that would take me ten minutes of cross-referencing to replicate manually. I get back a working scaffold in under a minute.

Multi-file refactoring. Renaming a concept across a codebase. Restructuring a module into smaller pieces. Moving shared types into a common package. This is the kind of work that is tedious but not intellectually hard. You know exactly what needs to happen, it just touches 30 files. I describe the transformation, the agent applies it everywhere, and I review the diff. Five minutes instead of an hour.

Test generation. I describe the function, the edge cases I care about, and the testing patterns used in the project. The agent writes the tests. They are not perfect. I usually adjust a few assertions and add cases the agent missed. But starting from 80% done is much faster than starting from zero.

Codebase exploration. "How does authentication work in this project?" The agent reads through the relevant files, traces the flow from middleware to handler to database, and gives me a summary with the relevant file paths. This is faster than reading through the code myself when I am new to a codebase or returning to code I wrote six months ago and forgot. I use this constantly at work when picking up tickets in unfamiliar parts of the codebase.

Repetitive modifications. Updating 30 React components to use a new prop pattern. Adding error boundaries to every page route. Migrating config files to a new schema. The kind of work that takes a full afternoon of careful, boring edits. I describe the pattern, point the agent at the directory, and review the results. Done in fifteen minutes. This is where agents save the most raw time. Not the clever work, the tedious work.

What does not work yet

There are clear boundaries to where agents help.

Architectural decisions. The agent does not know your business constraints. It does not know that the database migration needs to be backward-compatible because you run blue-green deployments, or that this service handles 50,000 requests per second and cannot take a dependency on an external API. It will give you an architecture that looks reasonable in isolation but misses the context that matters. You can provide some of this context in the prompt, but there is a limit to how much institutional knowledge you can cram into a conversation. Some decisions require judgment that comes from living with a system for months.

Subtle bugs. When the fix requires understanding why the code was written in a particular way (not just what it does), agents struggle. The bug might be in the gap between intention and implementation, and the agent only sees the implementation. I have watched agents confidently "fix" a bug by reverting a deliberate workaround, then introducing the original problem the workaround solved.

The confidence trap. Sometimes the agent does the wrong thing with complete confidence. It restructures code in a way that breaks an implicit contract. It introduces a dependency you do not want. It "improves" error handling by swallowing exceptions. The output looks clean and well-structured, and the problems only surface later. When this happens, you spend more time understanding and reverting the agent's work than doing it yourself would have taken.

The review burden. This is the one people underestimate. The agent writes fast. Very fast. But you still have to read and understand every line. Speed of writing is not the bottleneck in software development. Speed of understanding is. A 500-line diff generated in two minutes still takes twenty minutes to review properly. If you skip the review, you are gambling.

Over-delegation. I learned this the hard way. Let the agent make too many decisions, and you end up with code you do not own mentally. You cannot debug it effectively because you do not fully understand the choices that were made. You become a stranger in your own codebase. The code works, passes tests, and you could not explain the implementation to a colleague without reading it first.

Two tools, two contexts

I use Claude Code for personal projects and OpenCode at work. Back in February, when I wrote my comparison of AI coding assistants, I was still using Cline in VS Code alongside Claude Code in the terminal. Since then, the terminal took over completely. Once CLI agents became my default for substantial work, switching back to the IDE for agent-assisted edits felt like unnecessary context switching. The terminal was already open. The agent was already loaded with context. Cline was not doing anything wrong. I just stopped opening it.

For personal projects, I give the agent more freedom. Longer leashes, bigger tasks, more autonomy. If it makes a questionable architectural choice on my hobby project, the worst case is I refactor it later. The speed gain from letting it take bigger swings outweighs the occasional misstep. The MCP ecosystem is the real reason I use Claude Code for personal projects. Connecting the agent to databases, documentation sources, and custom tools gives it context that makes the output meaningfully better.

At work, the dynamic is different. Existing codebases, team conventions, code review from colleagues. I keep the agent on a shorter leash. Smaller, more focused tasks. More explicit instructions about which patterns to follow. OpenCode's provider flexibility matters in this context. I can use models that are approved for work-related code, and the open-source nature means the security team can audit what the tool is actually doing. That transparency matters more at work than on my personal projects.

The interesting thing is how the two contexts reinforce each other. Techniques I develop for working with agents on personal projects carry over to work. Patterns that work well in a team setting (clearer prompts, smaller scopes, explicit constraints) make my personal projects better too. The discipline of writing precise prompts at work, where mistakes cost more, improved how I work with agents everywhere.

How my workflow actually changed

The before and after is stark.

I used to spend roughly 80% of my time writing code and 20% reviewing it. That ratio has flipped to something like 40% writing, 60% reviewing. And the writing that remains is different. More architecture, more prompt crafting, more high-level design. Less boilerplate, less mechanical implementation.

I choose to build things I would not have attempted before. Features that would have taken a weekend of repetitive work now take a few hours. The implementation cost dropped enough that the calculation changed. Ideas I would have filed under "nice to have, not worth the time" became worth doing. This site is a good example. I have added features to it in the last few months that I would have skipped if every change required manual implementation across a dozen files.

The meta-skill changed. Typing speed does not matter much anymore. Communicating intent clearly does. The better I describe what I want (with constraints, examples, and context) the better the output. Writing good prompts is not fundamentally different from writing good technical specs. It is about being precise about what you want and explicit about what you do not.

My commits got bigger but more coherent. When an agent makes coordinated changes across ten files in one pass, the resulting commit tells a complete story.

Debugging changed too. Instead of setting breakpoints and stepping through code myself, I describe the symptom to the agent. "This endpoint returns a 500 when the user has no profile. The error log shows a null reference in the serializer." The agent investigates, traces the code path, and usually identifies the issue faster than I would have manually. Not always. But often enough to be the default first step. For the trickier bugs, the ones that require understanding the intent behind the code, I still step through it myself. The agent is the first pass, not the last.

The uncomfortable truth: you need to be a good enough developer to review what the agent produces. If you cannot spot a bad architectural decision, the agent will not save you. If you do not understand why a certain pattern exists, you will not catch it when the agent breaks the pattern. Agents amplify your existing skill. They do not replace it.

Where this is going

I have written about how agent skills work and building MCP servers before. Those building blocks are maturing. MCP is becoming the standard for connecting agents to external tools and data. The ecosystem is growing fast.

The gap between demo-quality and daily-driver-quality agents is closing, but it is still real. Every demo shows the perfect run where the agent nails it first try. Daily use means dealing with the 30% of cases where the agent gets confused, goes in circles, or confidently produces something wrong. You learn to recognize the patterns: when the agent starts repeating itself, when it is going down the wrong path, when you should interrupt and redirect. The tools are getting better at recovering from mistakes, but knowing when to intervene is still a human skill.

Local models are getting good enough for routine tasks. Code completion, simple refactoring, test generation. Not everything needs a frontier model. Running a smaller model locally for quick tasks while reserving the bigger models for complex work is a pattern that keeps cost and latency down. OpenCode makes this especially easy since you can switch providers per session.

The developers who benefit most from agents are the ones who already understood what to build. Agents do not fix unclear thinking. They amplify whatever skill level you bring. A senior developer with agents ships faster because they know what good output looks like and can course-correct quickly. A junior developer with agents writes more code but still needs guidance on whether it is the right code. The skill floor did not disappear. It just moved from "can you write this code" to "can you evaluate this code."

The biggest change is not the tools. It is the mindset shift from "I write code" to "I direct the building of software." Whether that makes you more productive depends entirely on whether you were good at directing in the first place.

Sources

July 5, 2026·10 min readAI

Why I built solidifai: parametric CAD through Claude Code

Starting every 3D printed part from a blank Fusion 360 viewport got old, so I built CAD for Claude Code.

April 6, 2026·16 min readAI

Hermes Agent by Nous Research: the AI agent that actually cares about security

What Hermes Agent is, how it compares to OpenClaw on security and usability, and why it earned my trust.

March 29, 2026·10 min readAI

How I would design an ad platform for LLMs

A technical breakdown of how a middleware ad layer for LLM APIs could work, why the economics demand it, and whether it should exist at all.

Enjoying the blog? Subscribe via RSS to get new posts in your reader.

Subscribe via RSS