AI, MCP, and the Hidden Prices of Information Hoarding – O’Reilly

The Mannequin Context Protocol (MCP) is genuinely helpful. It offers individuals who develop AI instruments a standardized option to name features and entry information from exterior programs. As a substitute of constructing customized integrations for every information supply, you possibly can expose databases, APIs, and inside instruments by way of a standard protocol that any AI can perceive.

Nevertheless, I’ve been watching groups undertake MCP over the previous yr, and I’m seeing a disturbing sample. Builders are utilizing MCP to rapidly join their AI assistants to each information supply they will discover—buyer databases, assist tickets, inside APIs, doc shops—and dumping all of it into the AI’s context. And since the AI is wise sufficient to type by way of an enormous blob of information and select the elements which can be related, all of it simply works! Which, counterintuitively, is definitely an issue. The AI cheerfully processes large quantities of information and produces cheap solutions, so no one even thinks to query the strategy.

That is information hoarding. And like bodily hoarders who can’t throw something away till their houses change into so cluttered they’re unliveable, information hoarding has the potential to trigger critical issues for our groups. Builders study they will fetch way more information than the AI wants and supply it with little planning or construction, and the AI is wise sufficient to take care of it and nonetheless give good outcomes.

When connecting a brand new information supply takes hours as a substitute of days, many builders don’t take the time to ask what information really belongs within the context. That’s how you find yourself with programs which can be costly to run and not possible to debug, whereas a complete cohort of builders misses the possibility to study the essential information structure abilities they should construct sturdy and maintainable purposes.

How Groups Study to Hoard

Anthropic launched MCP in late 2024 to present builders a common option to join AI assistants to their information. As a substitute of sustaining separate code for connectors to let AI entry information from, say, S3, OneDrive, Jira, ServiceNow, and your inside DBs and APIs, you utilize the identical easy protocol to offer the AI with all types of information to incorporate in its context. It rapidly gained traction. Firms like Block and Apollo adopted it, and groups all over the place began utilizing it. The promise is actual; in lots of instances, the work of connecting information sources to AI brokers that used to take weeks can now take minutes. However that pace can come at a value.

Let’s begin with an instance: a small crew engaged on an AI software that reads buyer assist tickets, categorizes them by urgency, suggests responses, and routes them to the suitable division. They wanted to get one thing working rapidly however confronted a problem: That they had buyer information unfold throughout a number of programs. After spending a morning arguing about what information to drag, which fields had been needed, and the way to construction the combination, one developer determined to only construct it, making a single getCustomerData(customerId) MCP software that pulls all the pieces they’d mentioned—40 fields from three completely different programs—into one large response object. To the crew’s aid, it labored! The AI fortunately consumed all 40 fields and began answering questions, and no extra discussions or choices had been wanted. The AI dealt with all the brand new information simply superb, and everybody felt just like the venture was heading in the right direction.

Day two, somebody added order historical past so the assistant might clarify refunds. Quickly the software pulled Zendesk standing, CRM standing, eligibility flags that contradicted one another, three completely different identify fields, 4 timestamps for “final seen,” plus complete dialog threads, and mixed all of them into an ever-growing information object.

The assistant saved producing reasonable-looking solutions, at the same time as the info it ingested saved rising in scale. Nevertheless, the mannequin now needed to wade by way of hundreds of irrelevant tokens earlier than answering easy questions like “Is that this buyer eligible for a refund?” The crew ended up with an information structure that buried the sign in noise. That further load put stress on the AI to dig out that sign, resulting in critical potential long-term issues. However they didn’t understand it but, as a result of the AI saved producing reasonable-looking solutions. As they added extra information sources over the next weeks, the AI began taking longer to reply. Hallucinations crept in that they couldn’t observe all the way down to any particular information supply. What had been a extremely invaluable software turned a bear to keep up.

The crew had fallen into the information hoarding lure: Their early fast wins created a tradition the place individuals simply threw no matter they wanted into the context, and finally it grew right into a upkeep nightmare that solely obtained worse as they added extra information sources.

The Abilities That By no means Develop

There are as many opinions on information structure as there are builders, and there are often some ways to unravel anyone drawback. One factor that nearly everybody agrees on is that it takes cautious selections and many expertise. However it’s additionally the topic of a number of debate, particularly inside groups, exactly as a result of there are such a lot of methods to design how your utility shops, transmits, encodes, and makes use of information.

Most of us fall into just-in-case pondering at one time or one other, particularly early in our careers—pulling all the info we would presumably want simply in case we want it quite than fetching solely what we want after we really need it (which is an instance of the other, just-in-time pondering). Usually after we’re designing our information structure, we’re coping with rapid constraints: ease of entry, measurement, indexing, efficiency, community latency, and reminiscence utilization. However after we use MCP to offer information to an AI, we are able to typically sidestep a lot of these trade-offs…briefly.

The extra we work with information, the higher we get at designing how our apps use it. The extra early-career builders are uncovered to it, the extra they study by way of expertise why, for instance, System A ought to personal buyer standing whereas System B owns fee historical past. Wholesome debate is a vital a part of this studying course of. Via all of those experiences, we develop an instinct for what “an excessive amount of information” appears to be like like—and the way to deal with all of these difficult however essential trade-offs that create friction all through our tasks.

MCP can take away the friction that comes from these trade-offs by letting us keep away from having to make these choices in any respect. If a developer can wire up all the pieces in just some minutes, there’s no want for dialogue or debate about what’s really wanted. The AI appears to deal with no matter information you throw at it, so the code ships with out anybody questioning the design.

With out all of that have making, discussing, and debating information design selections, builders miss the possibility to construct essential psychological fashions about information possession, system boundaries, and the price of shifting pointless information round. They spend their early life connecting as a substitute of architecting. That is one other instance of what I name the cognitive shortcut paradox—AI instruments that make growth simpler can forestall builders from constructing the very abilities they should use these instruments successfully. Builders who rely solely on MCP to deal with messy information by no means study to acknowledge when information structure is problematic, similar to builders who rely solely on instruments like Copilot or Claude Code to generate code by no means study to debug what it creates.

The Hidden Prices of Information Hoarding

Groups use MCP as a result of it really works. Many groups rigorously plan their MCP information structure, and even groups that do fall into the info hoarding lure nonetheless ship profitable merchandise. However MCP remains to be comparatively new, and the hidden prices of information hoarding take time to floor.

Groups typically don’t uncover the issues with an information hoarding strategy till they should scale their purposes. That bloated context that hardly registered as a value to your first hundred queries begins exhibiting up as an actual line merchandise in your cloud invoice once you’re dealing with tens of millions of requests. Each pointless discipline you’re passing to the AI provides up, and also you’re paying for all that redundant information on each single AI name.

Any developer who’s handled tightly coupled lessons is aware of that when one thing goes fallacious—and it all the time does, finally—it’s loads more durable to debug. You typically find yourself coping with shotgun surgical procedure, that actually disagreeable scenario the place fixing one small drawback requires adjustments that cascade throughout a number of elements of your codebase. Hoarded information creates the identical sort of technical debt in your AI programs: When the AI offers a fallacious reply, monitoring down which discipline it used or why it trusted one system over one other is troublesome, typically not possible.

There’s additionally a safety dimension to information hoarding that groups typically miss. Each piece of information you expose by way of an MCP software is a possible vulnerability. If an attacker finds an unprotected endpoint, they will pull all the pieces that software offers. When you’re hoarding information, that’s your complete buyer database as a substitute of simply the three fields really wanted for the duty. Groups that fall into the info hoarding lure discover themselves violating the precept of least privilege: Purposes ought to have entry to the info they want, however no extra. That may carry an infinite safety danger to their complete group.

In an excessive case of information hoarding infecting a complete firm, you may uncover that each crew in your group is constructing their very own blob. Assist has one model of buyer information, gross sales has one other, product has a 3rd. The identical buyer appears to be like utterly completely different relying on which AI assistant you ask. New groups come alongside, see what seems to be working, and duplicate the sample. Now you’ve obtained information hoarding as organizational tradition.

Every crew thought they had been being pragmatic, delivery quick, and avoiding pointless arguments about information structure. However the hoarding sample spreads by way of a corporation the identical manner technical debt spreads by way of a codebase. It begins small and manageable. Earlier than you understand it, it’s all over the place.

Sensible Instruments for Avoiding the Information Hoarding Entice

It may be actually troublesome to teach a crew away from information hoarding once they’ve by no means skilled the issues it causes. Builders are very sensible—they need to see proof of issues and aren’t going to take a seat by way of summary discussions about information possession and system boundaries when all the pieces they’ve finished to date has labored simply superb.

In Studying Agile, Jennifer Greene and I wrote about how groups resist change as a result of they know that what they’re doing immediately works. To the particular person making an attempt to get builders to alter, it might seem to be irrational resistance, nevertheless it’s really fairly rational to push again in opposition to somebody from the skin telling them to throw out what works immediately for one thing unproven. However similar to builders finally study that taking time for refactoring speeds them up in the long term, groups have to study the identical lesson about deliberate information design of their MCP instruments.

Listed here are some practices that may make these discussions simpler, by beginning with constraints that even skeptical builders can see the worth in:

Construct instruments round verbs, not nouns. Create checkEligibility() or getRecentTickets() as a substitute of getCustomer(). Verbs drive you to consider particular actions and naturally restrict scope.
Discuss minimizing information wants. Earlier than anybody creates an MCP software, have a dialogue about what the smallest piece of information they should present for the AI to do its job is and what experiments they will run to determine what the AI really wants.
Break reads aside from reasoning. Separate information fetching from decision-making once you design your MCP instruments. A easy findCustomerId() software that returns simply an ID makes use of minimal tokens—and may not even should be an MCP software in any respect, if a easy API name will do. Then getCustomerDetailsForRefund(id) pulls solely the precise fields wanted for that call. This sample retains context centered and makes it apparent when somebody’s making an attempt to fetch all the pieces.
Dashboard the waste. The perfect argument in opposition to information hoarding is exhibiting the waste. Observe the ratio of tokens fetched versus tokens used and show them in an “data radiator” fashion dashboard that everybody can see. When a software pulls 5,000 tokens however the AI solely references 200 in its reply, everybody can see the issue. As soon as builders see they’re paying for tokens they by no means use, they get very inquisitive about fixing it.

Fast odor take a look at for information hoarding

Device names are nouns (getCustomer()) as a substitute of verbs (checkEligibility()).
No person’s ever requested, “Do we actually want all these fields?”
You may’t inform which system owns which piece of information.
Debugging requires detective work throughout a number of information sources.
Your crew hardly ever or by no means discusses the info design of MCP instruments earlier than constructing them.

Wanting Ahead

MCP is a straightforward however highly effective software with huge potential for groups. However as a result of it may be a critically essential pillar of your complete utility structure, issues you introduce on the MCP degree ripple all through your venture. Small errors have large penalties down the street.

The very simplicity of MCP encourages information hoarding. It’s a straightforward lure to fall into, even for skilled builders. However what worries me most is that builders studying with these instruments proper now may by no means study why information hoarding is an issue, they usually gained’t develop the architectural judgment that comes from having to make arduous selections about information boundaries. Our job, particularly as leaders and senior engineers, is to assist everybody keep away from the info hoarding lure.

Once you deal with MCP choices with the identical care you give any core interface—retaining context lean, setting boundaries, revisiting them as you study—MCP stays what it ought to be: a easy, dependable bridge between your AI and the programs that energy it.

AI, MCP, and the Hidden Prices of Information Hoarding – O’Reilly

How Groups Study to Hoard

The Abilities That By no means Develop

The Hidden Prices of Information Hoarding

Sensible Instruments for Avoiding the Information Hoarding Entice

Wanting Ahead

Right this moment’s NYT Wordle Hints, Reply and Assist for March 8 #1723

Synthetic Muscle groups, Boston Dynamics, and Extra Movies

The info behind the win: How Catapult and AWS IoT are reworking professional sports activities

IoT Now Contract Win Listing – February 2026

A Name for Collaboration in Building

The $5 DIY Digital Scale You Can Construct In the present day

Umbrella Trick Can Idiot AI Goal-Monitoring Drones, UC Irvine

Southern States Enhances Layered Airspace Safety Technique with SkySafe’s Drone Detection and Airspace Intelligence – sUAS Information

Sarvam Edge: A Newbie’s Information to On-System AI for India

How Amplitude applied pure language-powered analytics utilizing Amazon OpenSearch Service as a vector database

AI corporations face uneven public perceptions

The info behind the win: How Catapult and AWS IoT are reworking professional sports activities