AI Search

What Cloudflare Pay Per Crawl Means for Whether AI Engines Can Read You

Cloudflare now blocks AI crawlers by default and is adding a paid-access tier. A setting you never touched could be why you are absent from ChatGPT.

Cloudflare started blocking AI crawlers by default for every new site it onboards. If you put a site behind Cloudflare recently and never opened the bot settings, there's a real chance OpenAI and Perplexity are getting turned away at the door, and you'd have no way to know from your analytics. The block happens at the network edge, before the request ever reaches your server. Your robots.txt never gets a vote.

That makes this a layer most SEO advice skips. Everyone talks about robots.txt directives, which live on your origin and which a crawler chooses whether to obey. A CDN-level block is different. It overrides robots.txt, it's enforced rather than requested, and on Cloudflare it can now be on without anyone on your team deciding it should be.

What actually changed

Two things, and they're separate.

The first is the default. Cloudflare became the first major infrastructure provider to block AI crawlers by default for new domains, asking owners up front whether they want to allow access (Cloudflare). Old behavior assumed open unless you blocked. New behavior assumes closed unless you allow.

Pay Per Crawl is the second, and it turns the old binary into three choices. For each crawler you can Allow free access, Charge a set price for access, or Block it outright with no payment option (Search Engine Land). The charge tier runs on HTTP 402, the long-dormant "Payment Required" status code. A crawler either sends payment intent in its request headers and gets a 200, or it gets a 402 with a price and walks away.

Worth knowing where this sits today. Pay Per Crawl is in private beta, with Stack Overflow among early participants, while customizable 402 responses are already available to every paid Cloudflare customer. So the charging market is still forming. The default blocking is live now.

How to check whether your site is walled off

You can confirm your own status in a few minutes. The question is whether the specific bots that feed AI answers are reaching your pages.

If you're on managed hosting that puts Cloudflare in front of you without surfacing these controls, ask your provider which AI bots they allow by default. Some inherit the new closed-by-default posture without telling customers.

Visibility versus monetization is a real tradeoff, with a catch

The pitch for charging crawlers is straightforward. Your content has value, AI companies are building products on it, so make them pay. For a large publisher with content people specifically seek out, that logic holds. Stack Overflow has bargaining power because models genuinely need its answers.

Most startups and SMEs don't have that weight yet, and that's the catch. When you charge a crawler that has cheaper places to get similar information, it doesn't pay. It skips you. For a brand still building recognition, a 402 doesn't monetize the crawl. It deletes you from the answer, and being absent from ChatGPT is a strange way to protect content nobody can find you to value.

The honest split is about who's searching for you versus who you're trying to reach. If your audience already knows your name and seeks you out, restricting access costs you less. If you're trying to get discovered by people who don't know you exist, the AI engines are a discovery channel, and charging admission closes it.

Block scrapers and block AI search are not the same decision

A common mistake is treating all AI bots as one switch. They do different jobs, and blocking the wrong one quietly removes you from a channel you wanted.

BotRun byBlocking it removes you fromDefault leaning for most SMEs
`GPTBot`OpenAITraining future models on your contentOptional, low visibility cost
`OAI-SearchBot`OpenAIChatGPT's live search citationsKeep open if you want ChatGPT reach
`PerplexityBot`PerplexityPerplexity answers and citationsKeep open
`Google-Extended`GoogleGemini and Vertex AI trainingOptional

Google-Extended is the one that trips people up. Blocking it opts you out of Google using your content to train Gemini, but it does not pull you from Google Search or from AI Overviews, which run off standard Googlebot. So you can decline AI training and keep your Search presence intact. The reverse trap is blocking GPTBot to "keep your content out of AI," then wondering why you still don't appear in ChatGPT. GPTBot is training. OAI-SearchBot is the one carrying citations into ChatGPT's answers.

Decide per bot, against what each one actually controls, not with a single allow-or-deny for the whole category.

Confirm a real crawler hit after you change the setting

Flipping a bot to Allow in the dashboard isn't the end. Verify the change took effect with a real request from the real bot.

Watch your edge or server logs for the next genuine hit from the user agent you opened, and confirm it received a 200, not a 402 or 403. A crawler reaching your HTML with a success code is the only proof that matters. Self-tests with a spoofed user agent can mislead, since Cloudflare verifies many bots by their published IP ranges, not the user-agent string alone. The setting is correct when the actual bot shows up in your logs and gets through.

This is the layer above robots.txt, and it's the one that can override every careful directive you wrote there. A default you never set is now part of whether AI engines can read you at all. If you want a clear read on which crawlers currently reach your pages and which are bouncing off the edge, that gap is what the AI Visibility Check is built to expose. And if you'd rather not work out the per-bot policy alone, that's the kind of call I run through with clients in a consultancy engagement.


Chat on WhatsApp