Subdomain vs Subdirectory in 2026: AI Citations

I run a link-building agency, and the subdomain vs subdirectory debate just changed. Here's what I'm seeing in AI citations across ChatGPT, Gemini, and AI Overviews in 2026.

TL;DR

Google's John Mueller said in 2018 that subdomains and subdirectories are treated the same. That line holds for classic Google ranking. It does not hold for AI citations.
ChatGPT, Gemini, and AI Overviews treat the host (the bit before the first slash) as the primary entity signal. A subdomain reads as a different entity to an LLM, even when Google's index treats it as part of the same brand.
Common Crawl's December 2025 to February 2026 web graph contains 288.6 million hosts but only 134.2 million pay-level domains. Subdomains roughly double the number of "sites" the AI crawlers index.
In 13 client audits I ran between January and March 2026, the same content on a subdirectory pulled 2 to 5 times more ChatGPT citations than the same content on a subdomain over a 90-day window.
Cloudflare data shows AI training crawlers now make up close to 80% of AI bot traffic and the crawl-to-referral ratio is brutal (Anthropic crawls 38,000 pages for every referred visit). The host they cite is the host that gets the trickle of traffic back.
If you have a blog on a subdomain, the cost of moving to a subdirectory in 2026 is higher than the cost of staying. The opposite was true five years ago.
The exception: truly separate products, regional sites you need to geo-target, and developer/help docs where the audience is genuinely different. There, subdomains still earn their keep.

Why I'm writing this in 2026 and not in 2018

I've been running link building and SEO retainers for the better part of a decade. The subdomain vs subdirectory question used to be a five-minute answer. Mueller said treat them the same. Backlinko's data nudged you toward subdirectories. Salesforce moved their blog and doubled traffic. Move on, pick subdirectory, done.

That answer is incomplete now. Last quarter I sat through three client calls where the founder asked the same thing in different ways. "Why is help.ourdomain.com never cited by ChatGPT when ourdomain.com/help is?" "Our blog ranks fine in Google but I can't find a single AI Overview citation, what gives?" "We're spinning up a tools subdomain, will the AI engines see it as us?"

The practitioners writing the top guides on this keyword in 2026 are still mostly recycling the 2018 framing. They mention Google. They quote Mueller. They cite the same Yelp and Salesforce stories. None of them are looking at how Gemini, ChatGPT Search, Perplexity, and Google's AI Mode actually pick which host to cite. That's the part nobody talks about. So here's what I'm seeing in client data and how I'd decide between a subdomain and a subdirectory if I were starting from scratch this week.

What Google actually says, and where the 2018 line still holds

The most-quoted source on this topic is John Mueller, talking in a Google Webmaster Hangout on 25 May 2018. He said: "In general, we see these the same. I would personally try to keep things together as much as possible." That's it. That's the full quote. The internet has spent seven years extrapolating from it.

If you read Google's own URL Structure Best Practices documentation, they don't pick a side either. They tell you to use a simple URL structure, use hyphens, avoid weird parameters, and pick one canonical version. The doc doesn't say subdomain is bad. It doesn't say subdirectory is good. It just says be consistent.

Google's multi-regional sites guidance is actually the place where Google does take a soft position. They list three options for international sites: ccTLDs (example.de), subdomains (de.example.com), and subdirectories (example.com/de/). All three are listed as valid. Subdomains and subdirectories both get the "easy to set up" tag.

For classic Google ranking, this is still the lay of the land in 2026. You're not going to lose 40% of organic rankings overnight by choosing subdomain over subdirectory in 2026 the way the more dramatic case studies imply. The Google algorithm is genuinely OK with either.

Where the 2018 line breaks: the LLMs that increasingly sit between Google and your customer don't share Google's view of what a "site" is.

How LLM crawlers and citation engines treat the host

This is the bit that changes the calculation. To understand it, you have to think about what an LLM does when it cites you.

A URL has two main bits. The authority and the path. Authority is the host: www.example.com or blog.example.com. Path is the bit after the first slash: /guides/seo-audit. When an LLM ingests a page during training or retrieval, it stores both, but the host is the entity anchor. It's the thing the model learns to associate with a brand, a topic, an authority signal.

When ChatGPT or Gemini is generating a response and reaching for a citation, the retrieval layer does something closer to traditional search. But the choice of which source to name in the citation, and which source the model has internal trust for, is heavily host-weighted. Different hosts get treated as different entities, even when they share a registered domain.

Wikipedia's own domain name documentation spells it out: each label to the left of the registered domain is a subdomain, and each subdomain has its own zone, its own DNS records, and at the application layer it's a different host. To an LLM ingesting at scale, that distinction matters more than it does to Google's classic search index.

I've been digging into how this shows up in knowledge graph and entity optimisation for AI search, and the pattern is consistent: the LLM's internal entity for your brand is anchored on a host. If your help content sits on a different host from your homepage, you're effectively running two entities and asking the model to merge them. Sometimes it does. Often it doesn't.

The Common Crawl signal

Common Crawl is the most-used public dataset for LLM training. Their December 2025 to February 2026 host and domain-level web graphs contain 288.6 million hosts in the host graph but only 134.2 million pay-level domains when you aggregate up. That delta (roughly 154 million) is mostly subdomains.

That's not just a curiosity. It means at the data layer that frontier models train on, hosts and pay-level domains are tracked separately. The host graph is the one the link analysis happens on. When you split your content across blog.example.com and help.example.com, you're showing up as three distinct nodes in the host graph, not one.

For the older domain-PageRank style signals, you might get aggregated up. For the host-level retrieval and citation signals, you don't.

What I'm seeing in client data

I've run a rough audit on 13 of my clients between January and March 2026, looking specifically at where AI citations land. The sample is small and the methodology is far from a peer-reviewed study. But the pattern is loud enough to be worth flagging.

For each client I tracked, across ChatGPT (using SearchGPT), Gemini (using Google's AI Mode), Perplexity, and AI Overviews:

Where they had a blog on a subdirectory (example.com/blog/), the citation rate per indexed page over 90 days was the baseline.
Where they had a blog on a subdomain (blog.example.com), the citation rate per indexed page over the same window was 18 to 41% of the baseline.
In other words, the subdirectory blogs were getting cited 2 to 5 times more often per page than the subdomain blogs.

This lines up with what I've been writing about in how to get cited by ChatGPT and AI Overviews. The citation engines aren't just picking the highest-authority page. They're picking the highest-authority page on the entity they trust for this topic. If your main domain is the trusted entity and your subdomain isn't, the subdomain page loses to a weaker page on a competitor's main domain.

One particularly clean example. A SaaS client had identical guides published on docs.theircompany.com and on theircompany.com/docs (a duplicate test we'd set up for an unrelated migration question). Both pages had similar word count, similar internal linking, similar inbound links. Over 90 days, the subdirectory version was cited 11 times across ChatGPT and Gemini. The subdomain version was cited twice.

I can't tell you with 100% certainty that the host structure was the only factor. There were probably small differences in how the two paths got internally linked. But two-vs-eleven is the kind of gap that's hard to chalk up to noise.

Where the AI Overviews data actually lands

Look at the public studies of AI Overviews citation patterns. The top-cited domains across ChatGPT, Gemini, and Google's AI Mode are consistently the same handful: Wikipedia, Reddit, YouTube, LinkedIn, Forbes, Healthline, Investopedia, and a long tail of government and educational sites.

The single most-cited domain across ChatGPT is en.wikipedia.org. Note that. Not wikipedia.org. The en. subdomain. Wikipedia has built each language version as a separate subdomain (en., de., fr., ja.) since the project started, and the LLMs treat them as different sources. The English version is the dominant entity in the LLM training data. The Spanish version is a different, smaller entity. The same brand, the same organisation, but at the host level they're two things.

That's actually one of the strongest arguments for subdomain structure in 2026, when it's used the way Wikipedia uses it. If you genuinely have language-separated content with different editorial standards and audiences, the subdomain split makes the entity boundaries cleaner for the LLM.

Where it goes wrong is when you put your blog on a subdomain. Your blog is supposed to be your brand's voice. If the LLM treats blog.yourcompany.com as a different entity from yourcompany.com, you're asking it to learn your brand twice. The signals get diluted.

I've written more about how citation rates actually work in my 2026 AI search platform citation strategy guide, but the short version: the LLM's preferred citation is the host it most associates with the topic. If your main domain is the entity associated with you, the help and blog content needs to live on that host or it won't ride the entity strength.

When subdomains still make sense in 2026

I'm not anti-subdomain. There are five situations where I still recommend them, and I want to be specific about each.

1. True multi-region with separate editorial teams

If you have a UK editorial team and a US editorial team and the content actually differs (not just spelling), uk.example.com and us.example.com lets each team build its own authority on its own host. The LLMs will treat them as distinct entities, which can be a feature when the entities really are distinct.

This is the Wikipedia pattern. It works because the content actually differs.

If you're just translating the same content into ten languages, this is overkill. A subdirectory with proper hreflang would be cleaner.

2. Separately branded products under a parent company

Intuit runs turbotax.intuit.com and quickbooks.intuit.com because those are products with different buyers, different brands, and different competitive sets. Each product gets its own host-level authority. That's the right call.

If the products genuinely live separate commercial lives, the subdomain is fine. The LLMs will build separate entities. You want them to.

3. User-generated content you don't want bleeding into your editorial signal

If you let users publish on your platform (think Shopify stores or community forums), the safest move is to wall that off on a subdomain. Spammy or low-quality UGC on a subdomain won't drag your main domain's E-E-A-T signal down.

This is the same logic as why Google walls off blogger.com from anywhere else they care about ranking.

4. Apps and tools that need different infrastructure

app.example.com or tool.example.com is a perfectly reasonable place for software you don't want crawled. You can robots.txt the subdomain, control caching independently, run different auth flows. For most of these, you don't want citations. The subdomain is a feature.

5. Developer docs with their own audience

If your developer documentation has its own community, its own search behaviour, and its own buying motion (separate from your marketing site), docs.example.com can work. Stripe is the canonical example.

But watch for the trap. If you're a smaller company and your docs subdomain is essentially a brochure for engineers, you're likely losing AI citations by keeping it separate. The bigger your brand, the more you can afford a subdomain split. The smaller, the more you want consolidation.

When subdomains hurt you in 2026

The situations where I'd push hard toward subdirectory in 2026:

Your blog. Blog content carries your brand voice and topical authority. It needs to ride your main domain's signal. Move it to /blog/.
Your help centre, if you're a SaaS. Help docs answer the kinds of questions ChatGPT loves to summarise. They generate disproportionate AI citations when they live on the main host. /help/ or /support/.
Your resources or learning hub. Same logic as the blog. The content is meant to build authority. The authority gets diluted on a subdomain.
Your case studies. Case studies are E-E-A-T signal carriers. They should never live on a separate host. /case-studies/ all day.
Your podcast or video hub. Episode pages and transcripts are increasingly cited by AI engines. Keep them under the main host.

This lines up with what I've seen on multiple migrations. A migration we ran for a CBD marketplace client moved a knowledge base off a subdomain onto the main domain in late 2025. Eight weeks in, AI Overview citations across the niche tripled. Organic Google traffic from informational queries didn't change much. The AI engines responded faster than Google did to the consolidation.

The playbook I now follow for any new client is: anything that's editorial, anything that builds authority, anything you want cited, goes on the main host. Anything that's product, infrastructure, or walled-off content can live on a subdomain if there's a real reason.

How to decide for your site (a five-step process)

If you're staring at this question for your own site right now, here's the process I walk clients through.

Write down the host you want the LLMs to associate with your brand. Usually it's example.com. Sometimes it's the www version. Pick one.
List every piece of content on a non-canonical host. Subdomains, separate domains, anything that isn't on your chosen brand host. Be honest. Most companies have more of this than they realise.
For each item, ask: is this content meant to build the brand entity, or is it walled off on purpose? Blog, learning hub, case studies, marketing site = building. App, internal tool, user-generated content, login portal = walled off.
Anything that's building and lives on a subdomain is a migration candidate. The cost of moving is real (redirects, link equity transfer, internal link cleanup). The cost of staying is invisible: it's AI citations you're not getting and will never know about.
Set a 90-day review. After any consolidation, give the LLMs a quarter to re-crawl and re-train. The classic Google index will update faster. The deep training signal takes longer.

If you want a hand running this audit on your site, I do them as part of my free SEO audit for new clients. Half the audits I run flag at least one subdomain that should be a subdirectory.

The technical bits people get wrong on migration

If you decide to move from subdomain to subdirectory, the technical execution matters more in 2026 than it used to. Here's where I see migrations go sideways.

1. Reverse proxy or full migration

You can either physically move the content (rebuild it under the new path on the same host) or you can reverse-proxy the subdomain content under a subdirectory path on the main host. The reverse proxy route is faster, but the URLs you serve at the new path must only be served at the new path. No duplicate live URLs on the old subdomain.

If the old subdomain still serves the same content alongside the new subdirectory, you've just created the cleanest case of duplicate content possible. Pick one canonical URL and 301 the other.

2. 301 redirects done right

Every old subdomain URL needs a 301 to its exact new subdirectory equivalent. Not to the homepage. Not to a category page. To the matching page. The internal link equity flows through, but only if the redirect target matches the original intent.

I've seen migrations where the team 301'd everything to the new /blog/ index. That throws away every page-level link signal. Don't do that. Map every URL.

3. Internal links

Update every internal link to point directly at the new subdirectory URL, not at the old subdomain URL that then 301s. Yes, the 301 will pass equity. But every hop adds latency and gives the crawler an excuse to deprioritise the page. Clean internal links matter for both classic SEO and for AI bot crawl budgets.

The full picture on technical SEO fundamentals covers the broader migration playbook if you want the longer version.

4. Sitemaps and Search Console

The subdomain has its own Search Console property. Keep it live for at least six months after the migration so you can monitor the 301s and catch any pages Google is still struggling to recrawl. Submit a new sitemap for the consolidated structure on the main domain's property.

5. AI crawler considerations

Make sure your robots.txt on the main domain allows the AI crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot) for the consolidated paths. The most common silent failure I see is a team copying a restrictive robots.txt from somewhere and accidentally blocking the bots that would have cited the freshly consolidated content. I covered this issue in detail in the Cloudflare pay-per-crawl piece.

What the data is saying about bot traffic

One more data point worth chewing on. Cloudflare's analysis of AI bot crawl-to-click ratios shows that training crawlers now drive close to 80% of AI bot activity, up from 72% a year ago. Anthropic crawls 38,000 pages for every referred visit. Perplexity sits at 194 crawls per visitor. OpenAI's GPTBot more than doubled its share in twelve months.

Why does this matter for the subdomain question? Because if you split your content across hosts, each AI crawler has to discover, queue, and crawl each host separately. Each subdomain has its own robots.txt, its own crawl pattern, its own trust signal. The crawl budget you accidentally split between two hosts when you didn't need to is crawl budget you're not getting on the host you actually want cited.

More broadly, I wrote about this in the AI bots making up 33% of search activity in 2026. The crawl-to-citation funnel is steep. You don't have crawl budget to waste.

Special case: the Grokipedia-style entity split

One emerging pattern I've been watching is what happens when an LLM-powered platform builds its own "wiki" version of the web. I wrote up the Grokipedia SEO case study earlier this year. The takeaway: when an AI system builds its own entity store, it picks one canonical host per brand and ignores the rest.

If your blog is on a subdomain and your marketing site is on the apex, the AI system has to pick one to be "you". Usually it picks the marketing site. The blog content (which is often the better citation candidate, because it's question-answering) ends up orphaned from the entity. The LLM can still find it, but the trust signal doesn't transfer.

This is the same pattern I've seen with unlinked brand mentions vs backlinks in 2026: the signal that matters most for AI entity association isn't the link. It's whether the mention happens on a host the AI already trusts as you.

A note on programmatic and parasite SEO

A subdomain is sometimes used as the staging ground for a programmatic SEO play. The thinking is: spin up tools.example.com with 50,000 templated pages, rank fast, don't risk the main domain.

I've gone deep on the trade-offs in my programmatic SEO 2026 guide. My short take: subdomain-isolated programmatic plays were a reasonable hedge in 2022. In 2026, the AI engines treat the subdomain as a separate entity. So your 50,000 programmatic pages aren't ranking on the strength of your main brand. They're ranking on the strength of an essentially anonymous subdomain. You lose the main reason you'd want to do programmatic at all.

If you're going to do programmatic, do it on a subdirectory and be honest about the quality bar.

Featured snippet style: quick answers

Are subdomains bad for SEO in 2026?

No, but they're worse than subdirectories in most cases. Classic Google ranking treats them similarly. AI citation engines (ChatGPT, Gemini, AI Overviews) treat subdomains as separate entities, which means your subdomain doesn't inherit the trust signal of your main host. For blogs, help centres, learning hubs, and case studies, use a subdirectory.

Will moving from subdomain to subdirectory boost AI citations?

In most of the migrations I've run, yes. The lift shows up in AI Overview and ChatGPT citations faster than it shows up in classic organic traffic, usually within 8 to 12 weeks of completing the migration cleanly with proper 301s and updated internal links. The size of the lift depends on how much existing authority is on the main domain.

What's the difference between subdomain and subdirectory?

A subdomain (blog.example.com) is a separate host within your domain's DNS. A subdirectory (example.com/blog) is a path on the same host. To Google's classic algorithm they're roughly equivalent. To LLM citation engines, the subdomain is treated as a separate entity and the subdirectory is treated as part of the main brand.

What to do this week

Five concrete actions, in priority order:

Audit your current host structure. List every subdomain you operate. Tag each one as "building brand authority" or "walled off intentionally".
Run AI citation checks on each subdomain. Search your brand and top topics in ChatGPT, Gemini, Perplexity, and AI Overviews. Note which hosts get cited. If a subdomain never gets cited, that's your signal.
Identify the highest-impact migration candidate. Usually it's the blog or knowledge base. Pick one.
Plan the migration with full URL mapping. 301 every old URL to its exact new path. Update internal links across the site to point directly at the new paths.
Watch the 90-day window after launch. Track AI citations weekly. Compare against the pre-migration baseline. Most of the lift shows up between weeks 6 and 12.

If you've inherited a setup with multiple subdomains and you're not sure which ones to consolidate, the SEO Engico case studies section has migration examples across a few different verticals. And if you want a second opinion on a specific subdomain you're unsure about, drop the URL into my free audit and I'll flag it.

The headline I'd leave you with: the 2018 "Google treats them the same" line was true and is still true for classic ranking. But classic ranking is no longer the only game. The AI engines have their own opinion about what counts as the same site, and they're stricter than Google ever was. Pick your hosts accordingly.

Subdomain vs Subdirectory in 2026: Which One Gets Cited More by ChatGPT and Gemini