Essay · 05

Your CDN might be hiding you from AI search — and you'd never know

Q: How do I check whether my site blocks AI crawlers?

Open yourdomain.com/robots.txt and look for Disallow: / under an AI user-agent such as GPTBot or ClaudeBot, or a Content-Signal: ai-train=no line. The free tool at patrickrobinson.consulting/tools/ai-visibility-check will read it for you and tell you whether the block is your own rule or a Cloudflare-managed default.

17 June 2026 · 5 min read

Here's the deal. Before an AI engine can recommend your business, it has to be allowed to read your website. That permission lives in one boring file — robots.txt — that you almost certainly have never opened. And there's a decent chance something switched it against you without telling you.

I found this on my own site, which is the embarrassing part. I run an AI-search practice. My homepage was telling ChatGPT, Claude, Google's AI and the rest not to read it — because my CDN had quietly added the block. It renders fine in a browser. You only see the problem if you read the raw file. I'd been invisible to the engines I help other people get cited by.

What actually happened

Cloudflare sits in front of a large share of the web — 28.5% of all sites whose web server is known, per W3Techs (June 2026). So this is not a niche setting. The timeline, with receipts:

3 July 2024 — Cloudflare shipped a one-click "Block AI bots" toggle, free on every plan. Off by default; you had to switch it on.
1 July 2025 — "Content Independence Day." Cloudflare became the first big infrastructure provider to make AI-crawler blocking the default. For newly-onboarded domains you are now asked at sign-up whether to allow AI crawlers, and the starting position is control. Covered the same day by MIT Technology Review and Nieman Lab.

The motivation was real: in June 2024, Cloudflare's own telemetry showed AI bots hitting roughly 39% of the top million sites it fronts, while only 2.98% of those sites did anything about it. Publishers wanted a way to say no. Fair enough.

But "say no to AI" is exactly the wrong default if your business wants to be found in AI answers — which, if you sell anything a buyer researches, you do.

The nuance — because the honest version matters

I'll save you the over-claim you'll hear from people selling panic. It is not true that "every Cloudflare site now blocks AI." Two things temper it:

The default applies to new sign-ups, not a retroactive flip. Sites already on Cloudflare kept their existing setting and a one-click toggle. So whether your site is affected depends on when and how it was set up — you have to check, not assume.
The managed block targets the training crawlers. Cloudflare's managed robots.txt is layered onto your own file and aims at GPTBot, ClaudeBot, Google-Extended, Common Crawl and friends, while leaving the live-answer crawlers — the ones that fetch a page to answer a question right now — alone, with a Content-Signal: ai-train=no. So the common state is not "totally invisible." It's "shut out of what the models learn, but still reachable for a live citation." Half-visible. Which is its own quiet tax.

That distinction — your own deliberate rule versus a default your CDN applied — changes everything about what you do next. One is a decision. The other is an accident you can undo with a dashboard toggle, no code change.

The thirty-second check

Type your domain followed by /robots.txt into a browser — yourcompany.com/robots.txt. Look for either:

Disallow: / under User-agent: GPTBot (or ClaudeBot, Google-Extended, PerplexityBot), or
a line reading Content-Signal: ... ai-train=no.

If you see those, the engines are being told to stay out. If reading raw text isn't your idea of a good time, I built a free check that does it for you and tells you whether the block is yours or your CDN's: Check your AI visibility.

Why I bother writing this down

Because the gap here was never about skill. My own robots.txt was, in fact, written to welcome the AI engines by name. A platform default was overriding it at the edge, invisibly, with no feedback loop to tell me. That is the whole shape of AI-search work right now: the failures are quiet, they hide in files nobody reads, and the fix is often one switch — once you know to look.

Being allowed in is only the first bar, mind you. It doesn't mean you'll be cited — that depends on whether your pages are written to be quoted, which is the harder, longer job. But you can't be cited if you're not allowed to be read. Start there. Check the file.

Common questions

Does Cloudflare block AI crawlers by default?

For domains onboarded since 1 July 2025, Cloudflare asks at sign-up and starts from a position of control, so new sites can be blocking AI crawlers by default. Sites already on Cloudflare before then kept their previous setting and a one-click toggle. The only way to know your own status is to read your robots.txt or run a check — it is not safe to assume either way.

Is it bad for my business if AI crawlers are blocked?

If you want to be found and cited when buyers ask ChatGPT, Perplexity, Claude or Google's AI for a recommendation, then yes — blocking those crawlers keeps you out of the answers. Some publishers block on purpose to protect content. For most businesses that sell something a customer researches, it is self-sabotage they did not choose.

How do I check whether my site blocks AI crawlers?

Open yourdomain.com/robots.txt and look for Disallow: / under an AI user-agent such as GPTBot or ClaudeBot, or a Content-Signal: ai-train=no line. The free tool at patrickrobinson.consulting/tools/ai-visibility-check will read it for you and tell you whether the block is your own rule or a Cloudflare-managed default.

Sources: Cloudflare blog "Declaring your AIndependence" (3 Jul 2024); Cloudflare "Content Independence Day" blog + press release (1 Jul 2025); MIT Technology Review and Nieman Lab (1 Jul 2025); Cloudflare managed-robots.txt developer docs; W3Techs Cloudflare usage report (Jun 2026).

← All essays