Skip to main content

Cloudflare's Default-Block Policy Just Erased Millions of Sites From AI Search. Here's How to Check If Yours Is One.

Cloudflare quietly switched on default AI-crawler blocking. Here's what changed, who got locked out of AI search, and the exact steps I use to audit a site's AI visibility.

Jhonty Barreto

By Jhonty Barreto

Founder of SEO Engico|May 1, 2026|12 min read

Cloudflare's Default-Block Policy Just Erased Millions of Sites From AI Search. Here's How to Check If Yours Is One.

Cloudflare's Default-Block Policy Just Erased Millions of Sites From AI Search. Here's How to Check If Yours Is One.

On 1 July 2025, Cloudflare flipped a switch that quietly cut a chunk of the web off from ChatGPT, Perplexity, Claude, and Google's AI Mode. They called it Content Independence Day. Most site owners I've spoken to since then had no idea it happened, and an even bigger group still doesn't realise their own site is sitting behind a default-block they never agreed to.

I run audits for a living, and over the last few months my agency has tested AI visibility on dozens of client and prospect sites. The pattern is depressing. Sites that rank well in Google. Sites with strong backlinks. Sites with content that should obviously be cited. Invisible in ChatGPT. Invisible in Perplexity. Invisible in AI Overviews. Why? Because Cloudflare answered "no" on their behalf months ago, and nobody told them.

This post breaks down what Cloudflare actually changed, what the data shows about the impact, and the exact process I use to check whether a site has been locked out. If you skip to the audit section, that's fine. Just please run the check.

What Cloudflare actually changed on 1 July 2025

Cloudflare became the first major infrastructure provider to flip the default. Before that date, AI crawlers like GPTBot, ClaudeBot, and PerplexityBot could fetch content from any Cloudflare-protected site unless the owner explicitly blocked them. After that date, every newly onboarded domain blocks AI crawlers by default unless the owner explicitly allows them.

The company also introduced two extra layers. First, a managed robots.txt feature that Cloudflare auto-updates as new AI bots appear. Second, a private beta of "pay per crawl," which lets publishers charge AI companies a per-request fee using a 402 Payment Required HTTP response. You can read the full announcement on Cloudflare's Content Independence Day post, and the technical details of the pay-per-crawl protocol on their pay-per-crawl announcement.

The reason this matters so much is scale. According to W3Techs usage statistics, Cloudflare sits in front of 22.4% of all websites on the public web as of May 2026. That is roughly one in five sites globally, and Cloudflare itself confirms over one million customers have enabled the AI blocking option since the rollout, per their official press release.

If you've ever wondered why your perfectly good content doesn't show up when someone asks ChatGPT a question your site clearly answers, this is one of the first things I'd check.

Why this is different from blocking Googlebot

I want to be clear about something, because this confuses a lot of people. Cloudflare's default block does not affect traditional Googlebot, Bingbot, or other classic search crawlers. Your blue-link Google rankings are not at risk from this change. What's at risk is your visibility in everything that runs on top of large language models.

That includes ChatGPT search, Perplexity, Claude with web access, Google's AI Overviews and AI Mode (which use a separate crawler called Google-Extended), Microsoft Copilot, and any other tool that fetches live web content to ground its answers. These systems use different bots from the ones Google uses to populate the regular search index, and Cloudflare's default rules block the AI ones while leaving the search ones alone.

If you've already started thinking about how AI search affects your traffic strategy, you've probably read our breakdown of how AI bots now make up 33% of search activity in 2026. The Cloudflare change is the inverse problem. It's not just that AI bots are crawling more, it's that the most popular hosting layer is now telling them to leave.

The data: how big is the actual blast radius

Let me share the verified numbers, because invented stats are everywhere on this topic and I'm not adding to that pile.

From Cloudflare Radar's analysis of crawler traffic between May 2024 and May 2025, here's the share of AI crawler traffic by bot:

  • GPTBot (OpenAI): 30% of AI crawler traffic, with a 305% increase in requests year over year
  • ClaudeBot (Anthropic): 21%
  • Meta-ExternalAgent: 19%
  • Amazonbot: 11%
  • Bytespider: 7.2%
  • ChatGPT-User (real-time fetches when ChatGPT users ask a question): up 2,825%
  • PerplexityBot: small overall share but up 157,490% in raw requests

Cloudflare also measures what they call the crawl-to-refer ratio, which is the number of times an AI bot crawls your site for every visitor it sends back. From their crawl-to-click gap analysis, the July 2025 ratios were:

  • Google: about 5 crawls per referred visitor
  • Perplexity: roughly 195 crawls per visitor
  • OpenAI: 1,091 crawls per visitor
  • Anthropic: 38,066 crawls per visitor

The Anthropic number is the one most publishers fixate on, and it's a real ratio. About 80% of all AI crawling is for training, only 18% supports search and citation, and just 2% is on-demand fetches triggered by users. That balance is exactly why Cloudflare made the move. The problem for the rest of us is that the same default also blocks the 18% that powers actual citations and the 2% that lets ChatGPT fetch your page when a user asks about your business.

If you want to see how AI citations translate into business impact, my colleague's piece on getting cited in ChatGPT and AI Overviews walks through what citation visibility actually looks like in practice.

Which bots get blocked, and which still get through

This is where it gets fiddly, because each AI company runs multiple crawlers with different jobs.

OpenAI runs three. According to the official OpenAI bots documentation:

  • GPTBot crawls for training data
  • OAI-SearchBot indexes pages so they can appear in ChatGPT Search results
  • ChatGPT-User fetches a page in real time when a user asks ChatGPT a question

Anthropic runs three as well, documented on Anthropic's crawler help page:

  • ClaudeBot for training
  • Claude-User for real-time queries
  • Claude-SearchBot for indexing

Google's AI products use Google-Extended, which is separate from the regular Googlebot. Blocking Google-Extended only stops AI training, it does not affect normal search rankings.

Cloudflare's default-block rule blocks the training bots first, but depending on the specific WAF rule set you have enabled, it can also catch the search and real-time bots. That's why I see so many sites where ChatGPT cannot even fetch the homepage when a user explicitly asks about the brand by name. The bot tries, the firewall says no, the model has nothing to work with, your brand doesn't appear in the answer.

How to check if your site is locked out: a 7-step audit

This is the part I want every reader to actually run. It takes about fifteen minutes per domain.

Step 1: Check your robots.txt

Visit yourdomain.com/robots.txt in a browser. Look for any of these lines:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

If they're there with Disallow: /, you are blocking those bots. Cloudflare's managed robots.txt feature adds these automatically when enabled. If you didn't write them, your hosting layer did, and you may not want them there.

If you need a refresher on the file syntax, our robots.txt SEO guide covers the directives in plain language.

Step 2: Open Cloudflare's bot management settings

Inside your Cloudflare dashboard, go to Security, then Bots, then Configure Bot Fight Mode and AI Scrapers and Crawlers. There's a toggle labelled "Block AI bots" or similar. Check whether it's on. Then check whether you have any custom WAF rules that match user agents like GPTBot, OAI-SearchBot, ClaudeBot, or PerplexityBot.

A surprising number of clients I've audited had this enabled by their original developer two years ago and then forgot. It blocks training bots, sure, but on some configurations it also blocks the search and inference bots that actually drive citations.

Step 3: Test with a real user-agent fetch

From a server you control, run:

curl -A "Mozilla/5.0 (compatible; GPTBot/1.3; +https://openai.com/gptbot)" -I https://yourdomain.com

A successful response is HTTP/2 200. A block usually shows as 403 Forbidden or a Cloudflare challenge page. Repeat with the OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, PerplexityBot, and Google-Extended user agents.

If any return a 403 or a challenge, that bot cannot read your site. Period.

Step 4: Check what AI tools actually see

Open ChatGPT (with web search enabled), Perplexity, and Claude. Ask each of them:

  • "What does [your brand] do?"
  • "Summarise the homepage of [yourdomain.com]"
  • "What services does [your brand] offer?"

If the model says it cannot access the site, or it makes things up that aren't on your homepage, or it cites a competitor instead of you, that's your answer. The bot couldn't get in.

This is the same diagnostic process I describe in our AI search platform citation strategy guide, applied specifically to the Cloudflare problem.

Step 5: Review server logs

If you have access to raw access logs (or Cloudflare's analytics with bot identification turned on), filter for known AI user agents over the last 30 days. You're looking for two things. Are they hitting you at all? And are they getting 200 responses or 403s? A site that used to receive thousands of GPTBot requests per week and now receives zero is a strong signal something changed.

Step 6: Check your hosting provider's defaults

Some platforms (Vercel, Netlify, certain WordPress hosts) have started shipping their own AI-blocking defaults that sit on top of Cloudflare. Read your hosting provider's recent changelogs for any AI bot or scraper rules added in the last 12 months. If they're there, you may need to override them.

Step 7: Decide your policy intentionally

There are three sensible positions to take, and each has trade-offs:

  1. Block everything. Fine if you're a paid-content publisher and your business model is paywalls.
  2. Allow search and inference, block training. This is what most businesses I work with want. You appear in ChatGPT, Perplexity, and AI Overviews, but your content doesn't go into the next training run.
  3. Allow everything. Best if you're an early-stage brand fighting for awareness and citation volume matters more than IP concerns.

What you should never do is end up at option 1 by accident, which is exactly what's happening to a lot of sites right now.

A robots.txt template that allows AI search but blocks AI training

This is the configuration I use for clients who want to be cited in AI tools without contributing free training data. It is not a magic bullet, and bots can ignore robots.txt, but the major ones do honour it.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

You'll also need to either disable Cloudflare's "Block AI bots" toggle or replace it with a custom WAF rule that only blocks the training bots, not the search and inference ones. The Cloudflare toggle is a blunt instrument, and on most plans it does not let you allow OAI-SearchBot while blocking GPTBot.

For deeper LLM and AI-search work, I've written about brand-level optimisation in our piece on LLM optimisation: how to get your brand into AI answers, which covers the content side of the same problem.

What about Common Crawl

A quick note, because this comes up a lot. Common Crawl is a non-profit that has been crawling the web since 2007 and has built a corpus of over 300 billion pages cited in more than 10,000 research papers. Its crawler is called CCBot. Most foundation models, including older versions of GPT and Claude, were partly trained on Common Crawl data.

Cloudflare's default block also blocks CCBot on most configurations. If you want to stop your content being scraped into open research datasets, that's fine. If you don't care about training data but do care about being a recognised entity in future model knowledge, you might want to allow CCBot. There is no clean answer here, only trade-offs.

What I'd actually do this week

If I were auditing a single site right now, my checklist would be:

  1. Visit your robots.txt and check for AI bot Disallow rules you didn't write
  2. Log into Cloudflare and check the AI Scrapers toggle
  3. Run curl tests with the major AI bot user agents
  4. Ask ChatGPT, Perplexity, and Claude what they know about your brand
  5. Review server logs for AI bot 403 responses over the last 30 days
  6. Decide intentionally which bots to allow, which to block, and which to charge if you're testing pay-per-crawl
  7. Update robots.txt and Cloudflare WAF settings to match your decision
  8. Re-run the AI tool tests in a week to confirm citations come back

This is not a one-time job. Cloudflare's managed rules update automatically, new bots appear monthly, and AI companies rename their crawlers more often than you'd think. Put it on the quarterly technical SEO calendar.

The honest truth is that AI search visibility is not yet stable enough to optimise like classic SEO. Citation patterns shift week to week. But the one thing you can absolutely control is whether the bots can reach your pages at all. If they can't, nothing else you do matters. Get the door open first, then worry about the content.

If you want a second pair of eyes on your AI visibility, my team at SEO Engico runs this audit as part of our standard technical engagement. Either way, run the seven steps above. The fix usually takes thirty minutes. The cost of leaving it broken is six months of being invisible to a third of all search activity.

Ready to grow?

Scale your SEO with proven systems

Get predictable delivery with our link building and content services.