Anthropic Published the Playbook for Self-Service Analytics with Claude. Here's the Fastest Way to Run It.

CorralData

The team behind Claude automated Anthropic’s business analytics queries with 95%+ accuracy. The finding most teams should focus on is a different one: the accuracy was originally just 21%.

In June 2026, Anthropic’s data science and data engineering team published a detailed account of how they enable self-service data analytics with Claude. The headline result is remarkable: 95% of business analytics queries at Anthropic are now automated by Claude, with roughly 95% accuracy in aggregate, freeing their data scientists for forecasting, causal modeling, and machine learning.

If you run a business and you’ve wondered whether your team could just ask an AI questions about your data instead of waiting on dashboards and analysts, this post is the strongest evidence yet that the answer is yes.

But the most important number in the post isn’t 95%. It’s the one Anthropic volunteers about what happened before they built the infrastructure around the model:

Without curated context and skills, Claude’s accuracy on Anthropic’s own analytics evaluations was 21%. With them, it consistently exceeds 95%.

That gap, from 21% to 95%, is the entire story. The model was never the hard part. As Anthropic puts it, pointing Claude at a warehouse and letting it run can create “a false sense of precision.” Everything that closed the gap was data infrastructure: governed datasets, curated definitions, maintained context, and validation.

That gap is what CorralData is built to close. This post walks through what Anthropic built, what it takes to replicate, and the shortcut with CorralData.

Why smart models give wrong answers

Anthropic’s framing is one every executive should internalize: analytics accuracy is a context and verification problem, not a code generation problem. Garbage in, garbage out. The model is only as reliable as the context around it. Claude can write flawless SQL. The question is whether it’s querying against the right table, with the right filters, using your company’s actual definition of key metrics, like “revenue.”

Their team identified three failure modes that account for the overwhelming majority of wrong AI answers and why self-serve analytics can fail:

The three failure modes of analytics agents: entity ambiguity, staleness, and retrieval failure. — The three failure modes behind most inaccurate AI analytics answers.

Entity ambiguity. Ask “how many active users do we have?” and the agent faces hundreds of plausible fields. What counts as active? Which lookback window? Include fraudulent accounts? With multiple near-duplicate tables, the agent confidently picks one, and it’s often subtly wrong.

Staleness. Schemas, definitions, and pipelines change constantly. Anthropic reports their own accuracy drifted from roughly 95% at launch to 65% within a month before they treated documentation maintenance as an engineering discipline. Yesterday’s right answer quietly becomes today’s wrong one.

Retrieval failure. Sometimes the right answer exists and is even documented, but across millions of fields the agent simply never finds it. Anthropic ran the experiment: they gave the agent direct access to thousands of historical queries where the correct answer was present about 80% of the time. Accuracy moved by less than a point. Access wasn’t the bottleneck; structure was.

What Anthropic built to fix it

Their solution is a four-layer stack, each layer aimed at one or more failure modes:

Data foundations. Canonical, governed, single-source-of-truth datasets, with the near-duplicates aggressively deprecated. When the agent goes looking for an answer, it finds one — not forty candidates it has to guess between.

Sources of truth. A human-curated semantic layer of metric definitions the agent is structurally required to use first, plus lineage and business context. Notably, they tried having an LLM auto-generate the metric definitions, and it made accuracy worse. Humans own definitions; AI helps with documentation.

Skills. Curated reference docs describing tables, joins, required filters, and gotchas, maintained in the same repo as the data models so every schema change updates the context describing it. This layer alone is what took Anthropic from 21% accuracy to 95%+.

Validation. Checks and balances on the answers the agent returns. Offline eval suites wired into CI, adversarial review of every query (worth +6% accuracy, at the cost of 32% more tokens and 72% higher latency), provenance footers on every answer, and agents that harvest stakeholder corrections into documentation fixes.

Read plainly, here’s what setting up agentic self-service analytics in-house requires: a data engineering team to model and govern the warehouse, a data science team to curate the semantic layer and write evals, CI infrastructure connecting all of it, and a permanent maintenance commitment, because the one time Anthropic relaxed it, accuracy fell from 95% to 65% in a month.

Anthropic’s article explains “what we did” — not “how you do it”. Rebuilding it at your own company means reverse-engineering most of the hard parts, and then staffing the permanent commitment to keep it working.

“Without skills, Claude’s ability to answer analytics questions accurately didn’t exceed 21% on our evals.”
Anthropic Data Science & Data Engineering team, June 2026

Anthropic can afford to staff and maintain that. Most growing companies cannot, and shouldn’t have to.

The good news: you don’t have to build the stack to have self-service analytics.

Here’s where most vendor commentary would tell you “don’t try this at home.” We’ll tell you the opposite: do exactly what Anthropic describes. Their architecture is correct. The question is which layers need to be built from scratch.

CorralData exists to be layers one through four, managed for you:

The four-layer agentic analytics stack (data foundations, sources of truth, skills, validation) mapped to what CorralData provides vs what you own. — Anthropic’s four-layer stack, mapped to who owns what when you run on CorralData.

Layer	Anthropic built	With CorralData
Data foundations	In-house pipelines, dimensional models, governed canonical datasets, CI enforcement	Managed pipelines from 600+ sources into a governed warehouse with canonical, modeled datasets, built and maintained by our team
Sources of truth	Human-curated semantic layer, lineage, business context graph	Your metric definitions, encoded and enforced by us. You own what “revenue” means; we make sure the AI can’t answer with anything else
Skills / AI context	Dozens of maintained reference docs, colocated with data models, synced across surfaces	A per-customer context layer describing your tables, joins, filters, and gotchas, updated by us as your data changes
Validation	Eval suites in CI, adversarial review agents, correction harvesting, provenance footers	Ongoing accuracy monitoring and query validation by our data team. You sign off on what matters; we catch the drift

Notice what stays on your side of the table in every row: your definitions and your business context. That’s not a limitation, it’s the design. Anthropic’s own ablation showed that when they let an LLM own the metric definitions, accuracy got worse. The human-owned part of this system is deliberately the part that requires zero engineering: deciding what your metrics mean. You already do that. Everything else, the pipelines, the modeling, the context maintenance, the validation, is the part that requires a team, and that’s the part we run.

Two paths to the same place

Two paths to self-service analytics with Claude: build every layer yourself, or start on CorralData and be live in weeks. — Build every layer in-house, or start with the stack already running.

Path A is legitimate. If you have a data engineering and data science team, Anthropic’s post is the best public blueprint available, and their appendix even includes the skill template they use. Budget for the build and, more importantly, for the forever-maintenance: their accuracy decayed 95% to 65% in the single month they under-invested in it.

Path B is the same architecture with the undifferentiated layers delivered as a service. Connect your sources, confirm your metric definitions with our team, and ask questions in plain English, in a governed environment where “revenue” has exactly one answer. Weeks, not quarters. No data hires required.

When the infrastructure is in place, you can ask AI — reliably.

CorralData comes with AskCorral AI built in. Ask any question about your business and get a governed, accurate answer directly in the platform. For teams that want to do the same in Claude or ChatGPT, the CorralData MCP connector plugs the governed data infrastructure into the model of your choice. That’s the capability Anthropic describes, asking Claude anything about your data and getting answers you can trust.

How to pressure-test any AI analytics setup, including ours

Anthropic closes with questions every data team should align on, and they double as a buyer’s checklist. Ask any vendor (or your own internal team):

Where do canonical definitions live, and who maintains them? If the answer is “the model figures it out,” you’ve recreated the 21% scenario.
What happens when the schema changes? If context isn’t updated alongside data, accuracy is already decaying.
How is accuracy measured? If there’s no eval process, nobody knows the error rate, including the vendor.
Is this set-it-and-forget-it? It should not be. Any honest answer involves ongoing monitoring, human review of edge cases, and a mechanism for surfacing drift before it compounds.
Can you see where an answer came from? Provenance is the difference between trusting a number and forwarding a mistake to your board.

These are the standards we hold ourselves to, and Anthropic’s post is the best argument we’ve seen for demanding them everywhere.

FAQ

Can I use Claude for self-service business analytics on its own?

Not reliably. On Anthropic’s own evals, Claude scored 21% accuracy on business analytics questions until it was surrounded by governed datasets, curated definitions, validation, and skills, after which it exceeded 95%.

What infrastructure does Claude need for accurate analytics?

A four-layer stack: governed data foundations, human-owned sources of truth, maintained AI context (“skills”), and continuous validation. Each layer targets a specific failure mode: ambiguity, staleness, or retrieval failure.

Do I need a data engineering team to get there?

You need the infrastructure that a data engineering team would build, not necessarily the team. CorralData provides the data infrastructure with managed pipelines, governed datasets, AI context, and validation; you keep ownership of metric definitions and business context, which is exactly what Anthropic recommends.

Run the playbook without building the stack

See your own data answering questions in plain English, on governed, canonical datasets, in a live demo with our team.

Book a demo

All figures cited from “How Anthropic enables self-service data analytics with Claude,” claude.com, June 3, 2026. Statistics describe Anthropic’s internal analytics environment and evaluation suite; results in other environments will vary with data complexity and governance maturity.