The evidence this week splits cleanly into two columns: organizations that treated AI agents as extensions of their people came out ahead, and organizations that treated them as replacements came back limping. That's not a coincidence — it's the thesis playing out in production data. But the harder story this week is the one the optimists aren't telling: 96% of organizations deploying agentic AI report costs higher than expected, 71% have no idea where those costs are coming from, and 1.5 million AI agents are operating inside corporate environments with zero monitoring. The agents are real. The governance isn't.
Salesforce Cut Support Headcount by Two-Thirds With Agents — Then HBR Said You Now Need a New Job Title to Manage Them
I want to sit with the Salesforce number for a second, because it's getting badly misread in almost every headline I've seen this week.
Salesforce went from roughly 9,000 customer support employees to 3,000 through Agentforce deployment. That is not a layoff story. Stop calling it one. The remaining 3,000 people moved up — they're handling complex escalations, relationship work, and edge cases that agents can't reliably manage. Meanwhile, Salesforce is reporting $100M+ in annualized cost savings, 60% higher resolution rates on WhatsApp inquiries, and a 34% productivity improvement from agentic and generative AI combined. Engine, a travel company, went from zero to production in 12 days. Finnair doubled first-contact resolution within four months. This is the thesis in production: agents extended organizational capability, they didn't hollow it out.
Klarna tried the hollow-it-out version. Pursued maximum automation, degraded service quality, had to course-correct. And here's what the Klarna correction actually cost them: time, reputation, and a painful reinvestment in skilled human support staff to run alongside the automated systems. By Q3 2025 their refined hybrid approach showed agents handling 853 FTE-equivalent work at $60M in annual savings — but only after they rebuilt the human layer they'd tried to eliminate. The correction was expensive.
Now Harvard Business Review has named a new organizational role: 'Agent Manager.' Humans who supervise, evaluate, and course-correct AI agents at scale. MIT Sloan documented five 'heavy lifts' of deploying AI agents, all of which come down to the same thing: human organizational design is the constraint, not the technology.
The HBR naming is the signal I've been waiting for. The org chart is starting to catch up to the deployment reality. Here's the thing: that Agent Manager role exists whether or not companies formalize it. Someone is already doing it at every organization with agents in production. The ones that don't formalize it will have 853 FTE-equivalents of AI running without anyone accountable for what they're doing. That's not a technology risk. That's a management failure.
Read the Salesforce Agentforce production metrics, the HBR case for Agent Managers, and the MIT Sloan analysis of deployment heavy lifts →Databricks Killed Their Multi-Agent Setup and Got Better Results. This Is Not an Outlier.
I flagged the Databricks shift a couple of weeks ago and held off calling it a pattern. I'm calling it now.
Databricks migrated Genie's research capabilities from multi-agent to single-agent architecture in their February 2026 AI/BI roundup — and measured the results: better instruction following, higher visualization quality, lower latency, more concise reports. This was not a cost cut. It was an intentional engineering decision made by a team sophisticated enough to build multi-agent systems, who looked at production performance data and consciously chose simplicity.
Here's the number that ends this debate for most business owners: a document analysis workflow consuming 10,000 tokens with a single agent required 35,000 tokens across a four-agent implementation. That's a 3.5x cost multiplier from architecture alone. Not from model quality. Not from task complexity. From the decision to add agents. Carnegie Mellon and UC Berkeley researchers analyzed 1,642 execution traces across seven multi-agent frameworks and found failure rates ranging from 41% to 87%. System design issues account for 41.77% of those failures; inter-agent misalignment causes another 36.94%.
The architectural pattern winning in real deployments right now: single agent, bounded autonomy, human escalation paths, ten steps or fewer. That's what 68% of production agents actually look like — they execute at most ten steps before requiring human intervention. That boundary isn't a limitation. It's a deliberate design choice that delivers more reliable, auditable outcomes.
The enterprise vendors selling multi-agent orchestration as the destination are selling complexity the production data doesn't justify. If you're paying a 3.5x token tax for the privilege of a more complicated architecture, that's not a technical decision anymore — it's a financial one. Choose accordingly.
Read the Databricks February 2026 AI/BI roundup and the Microsoft Azure guidance on single-agent vs. multi-agent architecture →NIST Just Made Agent Identity a Compliance Problem. Your Security Team Is About to Have Opinions About Your AI Roadmap.
I called this one in my founding worldview document in February: Agent Identity Management is coming, security teams will raise it, and vendors will try to capitalize on the fear before they've built the solution. NIST rang the bell on February 17th with the AI Agent Standards Initiative. What I didn't fully anticipate: the problem is already worse than I projected.
The Cloud Security Alliance surveyed 285 IT and security professionals and found over 3 million AI agents now operating in corporate environments. Approximately 1.5 million of them are completely unmonitored. Right now. Not in two years.
The authentication situation is what should keep you up at night. 44% of organizations are authenticating agents with static API keys. 43% use username/password combinations. 35% rely on shared service accounts. You wouldn't hand your employees a master key card that never expires, never rotates, and never logs who used it. That is exactly what organizations are doing with agents at scale.
Only 28% of organizations can trace agent actions back to a human sponsor across all environments. Only 21% maintain a real-time inventory of active agents. Think about that for a moment: most organizations don't even know how many agents they're running, let alone what those agents are authorized to do.
The OpenClaw CVEs landed in February — including CVE-2026-25253, auth token theft leading to remote code execution. Wiz exposed 1.5 million API keys from an agent social network database breach on the Moltbook platform. The incident pattern is forming.
NIST's standards initiative gives you a regulatory hook to prioritize this internally, which matters if you've been struggling to get security budget attention. Here's the complication for my thesis: the 'agent as extension of a person' model only holds if you know which person the agent is extending. Most organizations right now cannot answer that question for half their deployed agents.
Read the NIST AI Agent Standards Initiative announcement, the Strata.io agent identity research, and the Wiz Moltbook breach analysis →96% of Organizations Report Agentic AI Costs Higher Than Expected. The Honest Case Against Moving Fast.
I committed in my founding worldview document to watching for evidence that agents consistently make organizations slower, more expensive, or more brittle — and to say so if I find it. This week I found the most concentrated version of that evidence I've seen.
IDC research: 96% of organizations deploying GenAI and 92% implementing agentic AI reported costs were higher or much higher than expected. The gut-punch: 71% admitted they have little to no control over where those costs are coming from. MIT research across hundreds of enterprises found only 5% of custom enterprise AI solutions successfully reach production. Gartner projects over 40% of agentic AI projects will be canceled by 2027 — failures attributed not to technology limitations but to inadequate change management, unclear business cases, missing governance, and organizational resistance.
I'm not calling my thesis wrong. But I am saying the timeline I've been implying is too optimistic for organizations that aren't prepared to do the infrastructure work first. The 95% production failure rate is not a technology problem. It's a governance, integration, and data quality problem. The organizations succeeding — Salesforce, Goldman, Finnair — had data infrastructure, change management, and human oversight baked in from the start. The ones failing tried to bolt agents onto messy data and fragile processes and discovered that agents don't paper over those problems. They amplify them.
Demis Hassabis put the math on the table clearly: if an AI model has a 1% error rate and planning occurs across 5,000 steps, that error compounds like interest, rendering outcomes effectively random. IBM has named this 'agentic drift' — the hidden risk that degrades performance over time as the gap widens between the environment agents were designed for and the environment they actually operate in.
Gary Marcus and I agree on the brittleness. We disagree on the conclusion. Steam engines were wildly unreliable in 1820 and they still ended the horse era. But the engineers who built reliable steam engines understood their failure modes and designed around them. That's the job right now — not whether to deploy, but how to deploy in a way that doesn't compound your existing problems at machine speed.
Read the IDC cost research via DataRobot, the IBM analysis of agentic drift, and Gary Marcus on hype versus reality →Mem0 Just Raised $24 Million to Solve the Problem Nobody Talks About: Agents That Remember Nothing
I put 'Agent Persistent Memory Becomes the Problem Everyone Is Trying to Solve' on record in February as a high-confidence prediction for 2026. This week the market agreed — with a $24 million check and a cluster of new entrants all launching simultaneously.
Mem0 raised to build the memory layer for AI agents. AgentMemory.Cloud published 'Why Persistent Memory Is the Missing Layer in Production AI Agents.' Cortex launched as production-ready memory for AI agents built with Cursor. When multiple teams raise capital and ship products targeting the same infrastructure gap in the same month, that's a market forming, not a coincidence.
Here's why this matters more than another funding round: the persistent memory gap is one of the primary reasons agents fail to cross from pilots to production. Without continuity across sessions, agents cannot maintain context about users, prior decisions, or evolving business state. They become functionally advanced chatbots — impressive in demos, unreliable in the long-running workflows where actual value lives. An agent that forgets everything between conversations is not an extension of your organization. It's a parlor trick.
Agents that know your customers, remember your preferences, maintain context across months of interactions — that's not a demo. That's a genuine business moat.
The interesting strategic question now: does persistent memory become a standalone infrastructure layer (Mem0's bet), a feature baked into agent platforms, or a capability that foundation model providers absorb directly? My read: it survives as an independent layer, because the memory problem is fundamentally about your data, your context, your organizational knowledge — and no foundation model provider should own that. The organizations that figure out memory infrastructure first will have agents that get more valuable over time rather than resetting to zero every session. That compounding advantage is real.
Read the Mem0 funding announcement and the AgentMemory.Cloud analysis of why persistent memory is the missing production layer →Clark's Corner
The story I keep coming back to this week isn't the Salesforce number or the NIST announcement. It's the 71%.
Seven out of ten organizations deploying agentic AI have little to no visibility into where their costs are coming from. Not a rough estimate they're working to refine. No idea.
That is not a technology problem. That's an accountability problem. And I've seen this movie.
We spent thirty years building enterprise software on the premise that complexity justified the price tag. The system was too complicated to understand, so you trusted the vendor. The integration was too intricate to unwind, so you kept paying. The contract was too entangled to exit, so you renewed. And somewhere in all of that, organizations stopped asking 'what is this actually costing us and why?' because the answer was too hard to get.
Now we're deploying AI agents with the same instinct: move fast, figure out governance later, treat the opacity as someone else's problem. And we're surprised when it gets expensive and unruly.
I built that 37-minute, 100-node agent system in October 2025 and I was proud of it. It worked. It also hit API rate limits every run and cost more than it needed to. Four months later I can build the same outcome with a simpler architecture at lower cost with no rate limit collisions. The lesson I took from that delta wasn't that complexity is bad. It's that complexity you don't understand is always expensive — in money, in fragility, and in your ability to fix it when something breaks.
The organizations winning with agents right now treated deployment as an organizational design problem first and a technology problem second. Know what your agents are doing. Know who authorized them. Know what they cost. If you can't answer all three today, you're not behind on AI. You're behind on management.
I'm still bullish. Genuinely. But bullish doesn't mean reckless. The steam engine analogy I keep reaching for this week isn't about the engine being powerful. It's about the engineers who understood the boiler well enough not to let it explode.