
Right here’s a bit of little bit of snark from developer John Crickett on X:
Software program engineers: Context switching kills productiveness. Additionally software program engineers: I’m now managing 19 AI brokers and doing 1,800 commits a day.
Crickett’s quip lands completely as a result of it isn’t really a joke. It’s a preview of the following administration fad, whereby we exchange one unhealthy productiveness proxy (traces of code) with an excellent worse one (agent output), then act shocked when high quality collapses.
And sure, I do know, no one is doing 1,800 significant commits. However that’s the purpose. The metric is already being gamed, and brokers make gaming easy. In case your group begins celebrating “commit velocity” within the agent period, you aren’t measuring productiveness. You might be measuring how rapidly your group can manufacture legal responsibility.
The good promise of generative synthetic intelligence was that it might lastly clear our backlogs. Coding brokers would churn out boilerplate at superhuman speeds, and groups would lastly ship precisely what the enterprise needs. The fact, as we settle into 2026, is much extra uncomfortable. Synthetic intelligence is just not going to save lots of developer productiveness as a result of writing code was by no means the bottleneck in software program engineering. The true bottleneck is validation. Integration. Deep system understanding. Producing code and not using a rigorous validation framework is just not engineering. It’s merely mass-producing technical debt.
So what do we modify?
Considering accurately about code
First, as I argued lately, we have to cease excited about code as an asset in isolation. Each single line of code is floor space that should be secured, noticed, maintained, and stitched into the whole lot round it. As such, making code cheaper to put in writing doesn’t cut back the entire quantity of labor however as a substitute will increase it as a result of you find yourself manufacturing extra legal responsibility per hour.
For years, we handled builders like extremely paid Jira ticket translators. The belief was that you possibly can take a well-defined requirement, convert it to syntax, and ship it. Crickett rightfully factors out that if that is all you’re doing, then you’re completely replaceable. A machine can do fundamental translation, and a machine is completely completely satisfied to do all of it day with out complaining.
What a machine can’t do, nevertheless, is perceive crucial enterprise context. AI can’t really feel the monetary price of a compliance mistake or have a look at a buyer workflow and instinctively acknowledge that the underlying requirement is essentially incorrect. For this we want individuals, and we want individuals to thoughtfully think about precisely what they need AI to do.
Crickett frames this transition as a essential transfer towards spec-driven improvement. He’s proper, however we should be extremely clear about what a specification means within the agent period. It’s not yet one more Jira ticket however, somewhat, a set of constraints tight sufficient to make sure an LLM can’t escape them. In different phrases, it’s an executable definition of finished, backed totally by checks, API contracts, and strict manufacturing indicators. That is the precise sort of foundational work we’ve got underinvested in for many years as a result of it doesn’t appear like uncooked output; it seems like course of. You understand, that “boring stuff” that slows you down.
You possibly can see the friction taking part in out in actual time simply by trying on the feedback to Crickett’s tweet. You’ll discover individuals desperately making an attempt to sq. the circle of agentic improvement. One commenter tries to reframe the chaos by calling it structure versus engineering. One other insists that managing 19 brokers is definitely orchestrating, not context switching. A 3rd bluntly states that working greater than 5 brokers concurrently begins to appear like vibe coding, which is merely a well mannered phrase for playing with manufacturing methods. They’re all highlighting the core situation: You haven’t eradicated the work. You’ve simply moved it from implementation to supervision and assessment.
The extra you parallelize your code era, the extra “assessment debt” you create.
Observability to the rescue
That is the place Charity Majors, the co-founder and CTO of Honeycomb, turns into pissed off. Majors has argued for years you could’t actually know if code works till you run it in manufacturing, underneath actual load, with actual customers, and actual failure modes. While you use AI brokers, the burden of improvement shifts totally from writing to validating. People are notoriously unhealthy at validating code merely by studying giant pull requests. We validate methods by observing their conduct within the wild.
Now take that concept one step additional into the agent period. For many years, one of the crucial frequent debugging methods was totally social. A manufacturing alert goes off. You have a look at the model management historical past, discover the one who wrote the code, ask them what they have been making an attempt to perform, and reconstruct the architectural intent. However what occurs to that workflow when nobody really wrote the code? What occurs when a human merely skimmed a 3,000-line agent-generated pull request, hit merge, and moved on to the following ticket? When an incident occurs, the place is the deep data that used to stay contained in the creator?
That is exactly why wealthy observability is just not a nice-to-have function within the agent period. It’s the one viable substitute for the lacking human. Within the agent period, we want instrumentation that captures intent and enterprise outcomes, not simply generic logs that say one thing occurred. We want distributed traces and high-cardinality occasions wealthy sufficient that we will reply precisely what modified, what it affected, and why it failed. In any other case, we’re trying to function a black field constructed by one other black field.
Majors additionally presents important operational recommendation: Deploy freezes are a whole hack. The frequent human intuition when change feels dangerous is to cease deploying. However if you happen to hold merging agent-generated code whereas not deploying it, you’re merely batching danger, not lowering it. While you lastly execute a deploy, you’ll have completely no concept which particular AI hallucination simply took down your cost gateway. So if you wish to freeze something, freeze merges. Higher but, make the merge and the deploy really feel like one singular atomic motion. The quicker that loop runs, the much less variance you’ve, and the better it’s to pinpoint precisely what broke.
Golden paths are the best way
The repair for this impending chaos is to not depend on heroic engineers. As Majors factors out, resilient engineering requires a dedication to platform engineering and golden paths (one thing I’ve additionally argued). Such golden paths make proper conduct extremely simple and the incorrect conduct extremely exhausting. The most efficient groups of the following decade won’t be those with essentially the most freedom to make use of no matter framework an agent suggests, however as a substitute people who function safely inside the very best constraints.
So how do you measure success within the agentic period?
The metrics that matter are nonetheless the boring ones as a result of they measure precise enterprise outcomes. The DORA metrics stay the very best sanity verify we’ve got as a result of they tie supply velocity on to system stability. They measure deployment frequency, lead time for adjustments, change failure price, and time to revive service. None of these metrics cares concerning the variety of commits your brokers produced at present. They solely care about whether or not your system can take in change with out breaking.
So, sure, use coding brokers. Use them aggressively! However don’t confuse code era with productiveness. Productiveness is what occurs after code era, when code is constrained, validated, noticed, deployed, rolled again, and understood. That’s the important thing to enterprise security and developer productiveness.