Coding Is Cheap, Software Is Not

Why faster code generation is not the same as better software — and what Agentic Engineering does about it.

Long-form companion to the talk “Coding is cheap, Software is not! – Agentic Engineering explained” that Henning Teek and I gave at the Agentic Shift Meetup in Dortmund, April 2026.

AI assistants now write code faster than any human ever could. That’s the easy part. The hard part — the part most teams are running into right now — is that faster code is not better software.

This is the central tension of the current moment. The tooling has crossed a threshold. The methodology hasn’t caught up. The result is a gap between what’s technically possible and what’s actually shippable, and a lot of projects are falling into it.

The path across that gap has a name: Agentic Engineering. It’s worth unpacking what it is, why it’s needed, and what it looks like in practice.

The era we’re actually in

A quick scene-setter, because context matters.

From 2020 to 2022 — the Foundation Era — GPT-3 made plausible code generation possible. Codex and Copilot turned natural language into code. ChatGPT made it mainstream. The tagline of that era: AI suggests, humans decide. Autocomplete on steroids.

From 2023 to 2024 came the Cambrian Explosion. GPT-4. Cursor. Codeium, Windsurf, Tabnine, Sourcegraph, CodeWhisperer. By the end of 2024, millions of developers were using AI tools daily. The tagline shifted: from copilot to co-developer. AI started understanding context.

2025 and 2026 are the Agentic Era. 85% of developers use AI tools. Cursor reached a $29.3B valuation. Claude Code crossed $1B ARR in six months. Codex passed 2M weekly active users. The tagline again: from autocomplete to autonomous. Agents don’t just suggest lines — they take over tasks.

Each transition has been faster than the last. And each one has widened the gap between what AI can produce and what teams can actually ship to production.

What vibe coding actually is

“Vibe coding” has become a popular term, and like a lot of popular terms, it means something looser than it sounds. Worth being precise:

Vibe coding is when you prompt an AI and see what comes out. No structure. No verification. No specification. Just intent → output → hope.

It works astonishingly well for prototypes. A working demo of almost anything is an afternoon’s work. The output is often impressive enough that teams start believing this is the new normal.

It is the new normal — for prototypes.

For production software, vibe coding falls apart the moment complexity arrives. And complexity always arrives. Multilanguage support. A schema change. A new user role. An edge case in billing logic. Real software has to keep working when reality intrudes, and vibe coding has no defense against reality.

The pattern is consistent across teams: a quick prototype works beautifully, expectations rise, a second project starts with the same approach, and somewhere around the third or fourth requirement, the database model gets corrupted, the schema drifts from the code, and the project collapses under its own weight.

The wall is well-mapped. What matters is the path past it.

The ladder

Before getting into what Agentic Engineering looks like in practice, it helps to have a mental model for where a team is on the curve. The progression is a ladder with five rungs:

Chat — basic interaction with a model. Ask, receive, copy-paste.
Mid-loop generation — AI generates chunks of code that get stitched into a larger context.
In-the-loop agentic — AI assists inside the development environment, with access to files, terminals, tools.
On-the-loop agentic — AI operates with reduced direct supervision. Humans set goals; agents execute; humans check in.
Multi-agent coding — multiple agents collaborating on complex problems, often with specialized roles.

Most developers today are somewhere between rung 2 and rung 3. The interesting work — the work that’s actually changing how software gets built — is happening at rungs 4 and 5.

Here’s the catch: the ladder cannot be climbed without engineering discipline. Each rung up means less direct human review of each line of code. That trade only works if the system around the agent — specifications, guardrails, verification layers — gets correspondingly stronger.

Vibe coding at rung 4 is how production databases get corrupted.

Specification is the interface

The mindset shift that takes the longest to internalize:

In traditional software engineering, requirements are a documentation exercise. They get written, stakeholders sign off, developers consult them when convenient, and they slowly drift out of sync with the code. They’re nice to have. They’re rarely load-bearing.

In agentic engineering, requirements become a runtime component.

System prompts, skill definitions, tool descriptions, acceptance criteria — these are not artifacts the agent reads once and forgets. They are what the agent reads every time it decides what to do next. A vague spec means a guessing agent. An ambiguous tool description is a live bug. A missing edge case isn’t a documentation gap — it’s a production incident waiting to happen.

This changes the economics of writing specs.

Under the old model, time spent writing a precise requirement was time not spent building. There was a trade-off. Under the new model, the spec is part of the build. Time spent on the spec compounds — every future agent invocation benefits from it.

There’s also a feedback loop most teams haven’t discovered yet: AI is exceptionally good at helping write requirements. It can ask clarifying questions, probe edge cases, surface assumptions humans don’t realize they’re making, transform meeting transcripts into structured user stories, flag conflicts across large requirement sets, and generate test scenarios directly from acceptance criteria.

The better teams get at using AI to write requirements, the better their agents become at following them. That’s not a metaphor — it’s a literal causal chain.

The guardrails

Specifications tell the agent what to build. Guardrails tell the system what to reject when the agent gets it wrong. Both are necessary. Neither alone is sufficient.

Six guardrails make the biggest difference. None of them are new. What’s new is that they’ve gone from “nice engineering hygiene” to “load-bearing infrastructure.”

1. Continuous integration with short-lived branches

AI generates code at volumes that break traditional Git workflows. A long-lived feature branch that lives for two weeks accumulates so much agent-generated change that merging it back becomes a project of its own.

The rule: branches should live for hours, not days. Merge frequently. Resolve conflicts early. Keep the rate of change manageable. Continuous integration only stays continuous if integration actually happens continuously.

2. Statically typed languages

The compiler is a free guardrail. It catches a category of hallucination that dynamic languages let through silently.

TypeScript, C#, Java, Rust — the specific language matters less than the principle. The type system is the cheapest, fastest, most reliable feedback loop available.

One subtle pattern worth calling out: avoid primitive obsession. LLMs frequently swap same-type arguments. A function like greet(name: string, id: string) is one transposition away from a bug the compiler can’t catch. Replace it with greet(name: PersonName, id: PersonId) — domain types — and the compiler catches the swap instantly. Free guardrail, zero runtime cost.

3. Linting and formatting – deterministic, not AI

Don’t ask the AI to format code. Use Prettier, CSharpier, ESLint, whatever the stack expects. Deterministic tools are faster, more reliable, and don’t burn tokens.

Run them on the diff, not the whole codebase. Stage only what’s changed. Keep the context window clean for the work that actually needs reasoning.

4. Architectural unit tests

This one is underused. Tools like ArchUnit allow programmatic enforcement of design constraints — “the UI layer must never call the database directly,” “this module cannot depend on that one,” “domain logic cannot import framework code.”

The agent doesn’t have to remember the architecture. The tests do. When the agent violates a pattern, the build fails immediately. No human review, no architecture diagrams to consult, no slow drift over months.

5. Behavioral unit tests, not coverage tests

100% coverage targets are how teams end up with AI-generated slop tests. The agent will happily write tests that exercise every line and verify nothing.

Test behavior, not coverage. Test core domain use cases — the things that would hurt if they broke. Write tests that survive refactoring: if the behavior didn’t change, the test shouldn’t break. A test that breaks every time internals get reorganized is testing the wrong thing.

6. Code quality tools with MCP

The newer trend that’s about to change everything: tools like SonarQube and CodeScene now support the Model Context Protocol. Quality reports can be fed directly back to the agent, which can then autonomously refactor.

Cyclomatic complexity, code smells, vulnerabilities, deeply nested logic, “bumpy road” detection — all of it becomes input to the next agent invocation. The feedback loop closes. The agent doesn’t just write code; it reads the verdict on its code and revises.

This is one of the most interesting things happening in tooling right now, and most teams haven’t picked it up yet.

What this does to teams

The implications get uncomfortable here, because they’re organizational, not just technical.

For decades, the Mythical Man-Month argument has held: adding more people to a software project doesn’t make it faster, and often makes it slower. More engineers means more PRs, more orchestration overhead, more stepping on each other’s toes. Teams worked around this by building larger headcounts anyway, accepting the friction as the cost of doing business.

Agentic engineering breaks that calculation.

Two developers taking turns with well-instrumented agents can outship a twenty-person team. Not because the developers are individually heroic, but because the orchestration overhead collapses. Individual ownership replaces team coordination. Lines of code stop being the bottleneck — review, testing, and QA become the bottleneck.

The shape of the development funnel changes, too.

Under the old model, the software development funnel was lossy on purpose. Five hundred user problems would get scoped down to fifteen, coded to five, and shipped. Writing code was too expensive to do speculatively, so heavy filtering happened at every stage.

Under the new model, a screenshot pasted into the agent skips straight to written code. Everything that gets specified gets written. The filter doesn’t disappear — it moves. It moves to review, to testing, to QA, to deployment. That’s where the backlog accumulates now. Hundreds of PRs sit unshipped not because the code isn’t done, but because everything else around the code didn’t get cheaper at the same rate.

This is the work that’s left for humans, and it’s not less important than what came before. It’s arguably more important. Owning the review, owning the QA, owning the delivery — that’s where the leverage is now.

Big teams are aircraft carriers. Small teams with agents are speedboats. Both have their place, but the bet for most of what gets built today is on speedboats.

What Agentic Engineering actually means

Three principles, distilled:

Build the guardrails first. Before any feature work, set up the CI loop, the static types, the linters, the architectural tests, the behavioral tests. It’s slower on day one. It’s the only reason the project survives to day thirty.

Write the spec before the code. Not a document for stakeholders — a document for the agent. Be precise. Be unambiguous. Be explicit about edge cases. Let the AI help write it; that’s literally what it’s good at.

Shift the team’s role. Stop thinking of developers as the people writing code. Start thinking of them as the people owning the process — requirements, architecture, review, delivery. The agent handles tasks. Humans make decisions.

Coding is cheap. Software is not. The gap between those two is where the real work happens — and it’s also where the real value is going to live for the next decade.