AI Search

The agent-ready web, what it actually means to build for machines

Most sites are missing the files that would let AI agents read, navigate, and transact with them. Here's what the agent-ready surfaces actually do.

Open your terminal and run this.

curl -I https://yourdomain.com/.well-known/api-catalog
curl -I https://yourdomain.com/.well-known/agent-card.json
curl -I https://yourdomain.com/llms.txt

For almost every site, including most of the largest sites on the internet, you get three 404s in a row.

That sequence of failed requests is the entire "agent-ready web" story in three commands. We have spent two years talking about a web that AI agents can navigate, transact with, and trust. The files that would actually make that web exist are missing from the vast majority of domains people visit every day.

This post is about what those files are, what they each do, and which ones are worth your time right now versus which ones are a bet on a future that has not arrived. I am going to be blunt about the difference, because most of the writing on this topic blurs it on purpose.

What "agent-ready" actually means

A website is agent-ready when an autonomous AI agent that lands on it can do six things without rendering your JavaScript or guessing from your marketing copy.

  1. Discover what the site contains
  2. Retrieve that content in a machine-readable form
  3. Respect your access-control intent
  4. Act on the capabilities you expose
  5. Transact, where commerce is the point
  6. Trust the content, because you signal provenance and quality

Notice that these are not the same kind of task. The first three are about being legible. The next three are about being operable. That distinction is the most important thing in this entire conversation, and it is the thing almost every "make your site agent-ready" listicle flattens into a single checklist. Hold onto it. I will come back to it.

The human web was built for eyeballs, rendered pages, hero images, interactive checkout flows. An agent arriving at that web has two options. It can reverse-engineer your DOM, parse your HTML, and guess where the "buy" button is. Or it can read a purpose-built file you left for it that says, in structured terms, exactly what you offer and how to use it. The agent-ready surfaces are that second option. They are the equivalent of robots.txt and sitemap.xml, except for a reader that wants to act, not just crawl.

The legibility layer, cheap and mostly empty

These are the surfaces that help an agent understand you. They are inexpensive to ship and carry essentially no downside. They are also the ones almost nobody has.

`/.well-known/api-catalog` (RFC 9727). This is the only formal standard in the group. It is an IETF Standards Track document, published in 2025, that registers a well-known location returning a machine-readable list of and links to all your public APIs. If you run any kind of API, this is the front door that lets an agent find it without a human reading your docs. It is the surface with the lowest adoption and the highest legitimacy. That combination is the whole opportunity.

`llms.txt`. A Markdown file at your root that gives an AI a curated index of your most important content. This is the one surface that has any real-world traction, and even then it sits on roughly one in ten sites. Be honest with yourself about what it does and does not do. It is a community convention, not a standard. No major AI lab has publicly committed to using it as a ranking input in production. What it demonstrably does is reduce the fetch-and-parse cost for agents that are already crawling you, and make you cleaner to read when something decides whether to cite you. That is a real benefit. "It will get you cited more" is a claim nobody has proven. Ship it because it is cheap and tidy, not because someone sold you a ranking lift.

`sitemap.xml`, OpenAPI, structured data, and Link headers. The unglamorous foundation. A sitemap lets an agent list your URLs without crawling everything. An OpenAPI 3.1 spec describes your HTTP endpoints so a function-calling agent knows the paths, parameters, and errors. Schema.org markup gives meaning to your pages. Link headers with `rel=api-catalog` or `describedby` point at related surfaces without anyone parsing HTML. None of this is new. It is just rarely done well.

If you do nothing else this quarter, do this layer. It is the part of "agent-ready" that is genuinely safe, and the part where doing it puts you ahead of the field by default.

The action layer, powerful and immature

These surfaces let an agent do things on your site. They are more interesting, less mature, and not risk-free.

MCP (Model Context Protocol). An open protocol for exposing tools, resources, and prompts to an agent. The discovery surface is a server card at `/.well-known/mcp/server-card.json` that describes your server's transport, capabilities, authentication, and tool surface before an agent connects. MCP is the most mature of the action protocols, but its real center of gravity is server-side and enterprise integration, not the average content website.

A2A (Agent2Agent). This one is for agents talking to other agents, not agents talking to your pages. You publish an Agent Card at `/.well-known/agent-card.json`, a JSON "business card" declaring your identity, endpoints, capabilities, skills, and security schemes. A client agent reads it to decide whether your agent is the right one for a task and how to call it securely. This matters far more for a booking system negotiating with another system than for a blog or a brochure site.

WebMCP. This is the one to watch, and the one with the most hype to cut through. It lets a website expose structured, typed tools to an in-browser AI agent, either through HTML annotations on existing forms or through a JavaScript API. Instead of an agent screen-scraping your checkout, your page declares an "add to cart" tool with a defined schema. The payoff, if adoption lands, is large. Far fewer tokens than screenshot-based navigation, and much higher reliability. The reality is earlier than the marketing suggests. It is a W3C community draft, not yet an official standard. It ships in current Chrome behind a flag that is off by default. And no mainstream agent calls WebMCP tools on real websites yet. Treat it as a 2026 to 2027 bet, not a today capability.

x402. The payment layer. It revives the long-dormant HTTP 402 "Payment Required" status code. An agent requests a resource, your server answers with a 402 and a machine-readable description of the payment terms, the agent pays and retries. If agents are going to buy things, the transaction has to happen at the protocol level, because "add to cart, type a card number, click pay" was designed for a human hand.

The reason the action layer needs more caution than the legibility layer is simple. You are handing a non-human the ability to perform operations. WebMCP in particular has an open problem the spec is still working through, which is what happens when an adversarial page registers fake tools to trick an agent into doing something the user never asked for. Exposing capabilities is not the same as exposing content. It deserves a real security review, not a copy-paste from a tutorial.

One file that does not belong in this conversation

You will see `AGENTS.md` listed alongside these surfaces. It does not belong there. `AGENTS.md` is a "README for coding agents" that lives in a code repository and tells tools like Claude Code or Cursor how to build, test, and follow conventions in that codebase. It is genuinely useful, and adopted across tens of thousands of repos. But it is not a website discovery surface, and an agent browsing your marketing domain has no reason to look for it. When someone counts it as part of your site's agent-readiness, they are mixing two different problems. Worth knowing so you do not waste time chasing a metric that does not apply to you.

What is proven, and what is a bet

Here is the part the breathless takes leave out. The causal evidence that any of this drives AI citations or recommendations is, at the moment, thin to nonexistent. The most-studied protocol-layer signal is schema markup, and the cleanest study on it found no significant citation lift. Nobody has published comparable evidence that `llms.txt`, MCP, or A2A moves the needle on whether an AI surfaces you. Google's own guidance is that you do not need new machine-readable files to appear in generative search.

So why do any of it? Two defensible reasons, and you should pick yours consciously.

What is not a good reason. "It will get me cited more." Anyone selling you that certainty is selling a product, not a finding.

The actual playbook

If you want a sequence rather than a pile of options, here it is.

  1. Ship the legibility layer first. Sitemap, clean schema, an `llms.txt` that is honest about what you do, and a `/.well-known/api-catalog` if you run any API. Cheap, safe, and almost nobody else has done it.
  2. Add OpenAPI if you expose endpoints you are comfortable having an agent call. Read-only first.
  3. Treat the action layer as deliberate. MCP, A2A, WebMCP, and x402 are not "set and forget" tags. Each one is a capability you are granting, with a security and trust dimension. Adopt them when you have a specific agent interaction you want to enable, not because a checklist told you to.
  4. Measure the gap, not just your own files. The interesting question is not "do I have llms.txt." It is "how legible and operable am I to machines compared to where this is heading, and what is that worth." Almost nobody is tracking that as a metric yet. That space is wide open.

If you want a second pair of eyes on which surfaces actually matter for your site and your audience, the consultancy offering walks through exactly this decision.

The bar for the legibility layer really is on the floor. Step over it, because it is free and it is tidy. Just do not confuse stepping over a low bar with winning a race nobody has measured yet.


Chat on WhatsApp