All articles
AI SEO20 March 2026 · 18 min read

Subdomain vs Subdirectory in 2026: Which One Gets Cited More by ChatGPT and Gemini

Priyanshu Bisht

Priyanshu Bisht

SEO Executive

Subdomain vs Subdirectory in 2026: Which One Gets Cited More by ChatGPT and Gemini

In a hurry? Summarise this with AI.

Open it in your AI tool of choice for the short version.

On this page

The subdomain vs subdirectory question used to be a five-minute answer. John Mueller said treat them the same, the case studies nudged you toward subdirectories, you picked the subdirectory and moved on. Done.

That answer is now half right, and the half it gets wrong is the half that pays your bills in 2026.

Here is the short version of what our team keeps finding: classic Google ranking still treats subdomains and subdirectories roughly the same. The AI citation engines do not. ChatGPT, Gemini, Perplexity and Google's AI Mode read the host (the bit before the first slash) as the entity. A subdomain reads as a different brand to a large language model, even when Google's index happily lumps it in with everything else you own. If your blog sits on a subdomain, you are quietly handing AI citations to your competitors.

We have spent the better part of a decade running SEO and link building campaigns, and this is the rare topic where the standard 2018 advice has aged badly. Let's get into why, with real numbers we have actually verified.

What's the difference between a subdomain and a subdirectory?

A subdomain is a separate host inside your domain. blog.example.com is a subdomain of example.com. A subdirectory is a path on the same host. example.com/blog is a subdirectory. To Google's classic algorithm they are roughly equivalent. To an AI citation engine, the subdomain is treated as a separate entity and the subdirectory is treated as part of the main brand.

That distinction is not a marketing opinion, it is how the plumbing of the web works. Wikipedia's documentation on subdomains notes that a subdomain gets its own zone file with a Start of Authority record, configured through the parent domain's DNS. At the application layer, blog.example.com and example.com are two different hosts that just happen to share a registered name. Machines that ingest the web at scale notice that. Google chooses to look past it. The LLMs, mostly, do not.

What Google actually says, and where the 2018 line still holds

The most-quoted source on this entire topic is one sentence from John Mueller in a Google Webmaster hangout on 25 May 2018. As reported by Search Engine Journal, he said: "In general, we see these the same. I would personally try to keep things together as much as possible." That is the entire quote. The industry has spent seven years building a religion out of it.

Read Google's own URL structure best practices and you will notice they refuse to pick a side. They tell you to organise URLs "logically and in a manner that is most intelligible to humans", to use hyphens rather than underscores, and to keep things consistent. There is no line that says subdomains are bad. There is no line that says subdirectories win. It is studiously neutral.

The one place Google does take a soft position is its guidance on managing multi-regional sites. There Google lists ccTLDs (example.de), subdomains with a generic TLD (de.example.com), and subdirectories (example.com/de/) as three valid options, and explicitly marks URL parameters as "not recommended". Both subdomains and subdirectories get the "easy to set up" tag. Google genuinely does not mind which you choose for geo-targeting.

So for classic Google ranking in 2026, the lay of the land has not really moved. You are not going to lose 40% of your organic rankings overnight by choosing a subdomain, whatever the more dramatic case studies imply. Mueller's line was true and it is still true for blue-link search.

Here is where it breaks. The LLMs that increasingly sit between Google and your customer do not share Google's view of what a "site" is. And those LLMs are now deciding what a growing share of your audience reads instead of your page.

How AI crawlers and citation engines treat the host

To understand why this matters, picture what an LLM does when it cites you. It does not think in registered domains. It thinks in hosts and entities.

When a model ingests a page, during training or live retrieval, the host is the anchor it learns to associate with a brand, a topic and a trust signal. The path comes second. When ChatGPT or Gemini reaches for a citation, the choice of which source to name is heavily host-weighted. Different hosts get treated as different entities, even when they share a registered domain. If your help content lives on a different host from your homepage, you are effectively running two entities and asking the model to merge them. Sometimes it does. Often it does not.

We have been digging into this in our work on knowledge graphs and entity optimisation for AI search, and the pattern is consistent. The model's internal entity for your brand is pinned to a host. Split your authority-building content across hosts and you split the entity. The broader playbook for this sits in our guide to getting your brand into AI answers, but the host question is the one nobody else is talking about, so it is the one we want to hammer.

The Common Crawl signal nobody quotes

Common Crawl is the most widely used public dataset for training large language models, so what it tracks tells you a lot about what frontier models see. Their host and domain-level web graphs for December 2025 to February 2026 contain 288.6 million hosts in the host graph but only 134.2 million pay-level domains when you aggregate up. That gap, roughly 154 million nodes, is mostly subdomains.

This is not trivia. It means at the data layer the models train on, hosts and registered domains are tracked as separate things, with separate link graphs (12.4 billion edges at host level versus 5.4 billion at domain level). When you scatter content across blog.example.com, help.example.com and example.com, you are three distinct nodes in the host graph, not one. The host-level link analysis is where a lot of the trust gets computed. You do not get to borrow your apex domain's strength just because you share a name with it.

What we're seeing in client data

We ran a rough audit across 13 of our clients between January and March 2026, looking specifically at where AI citations landed. Small sample, far from a peer-reviewed study, and we will be the first to say so. But the pattern was loud enough that we are not going to keep it to ourselves.

For each client we tracked citations across ChatGPT (via its search mode), Gemini and Google's AI Mode, Perplexity, and AI Overviews:

  • Where the blog lived on a subdirectory (example.com/blog/), we set the citation rate per indexed page over 90 days as the baseline.
  • Where the blog lived on a subdomain (blog.example.com), the citation rate per indexed page over the same window was 18% to 41% of that baseline.
  • In plain English, the subdirectory blogs were getting cited 2 to 5 times more often per page than the subdomain blogs.

This lines up neatly with what we cover in our guide to getting cited by ChatGPT and AI Overviews. The citation engines are not just picking the highest-authority page on the web. They are picking the highest-authority page on the entity they already trust for that topic. If your main domain is the trusted entity and your subdomain is a stranger, the subdomain page can lose to a weaker page on a competitor's main domain. That stings, and most teams never even see it happen.

The cleanest example we have was almost an accident. A SaaS client had near-identical guides published on docs.theircompany.com and on theircompany.com/docs, left over from a duplicate we had spun up for an unrelated migration test. Similar word count, similar internal linking, similar inbound links. Over 90 days the subdirectory version pulled 11 citations across ChatGPT and Gemini. The subdomain version pulled two. We cannot swear host structure was the only variable, but two against eleven is not the kind of gap you wave away as noise.

Where the AI Overviews data actually lands

Want a clue from data nobody can argue with? Look at which domains the AI engines cite most. Semrush ran a 13-week study of more than 230,000 prompts and over 100 million AI citations between July and October 2025. The two most-cited domains on ChatGPT were Reddit and Wikipedia, with Wikipedia appearing in roughly 55% of ChatGPT responses early in the window before settling lower.

Now look closely at how that Wikipedia citation is structured. The dominant Wikipedia entity in LLM training data is en.wikipedia.org. Not wikipedia.org. The en. subdomain. Wikipedia has run each language as a separate subdomain (en., de., fr., ja.) since the start, and the models treat them as different sources. The English version is the giant entity. The Spanish version is a different, smaller one. Same organisation, two distinct things at the host level.

That is actually one of the strongest arguments for subdomains in 2026, when you use them the way Wikipedia does. If you genuinely have language-separated content with different editorial standards and different audiences, the subdomain split makes the entity boundaries cleaner. We dig into the Wikipedia pattern more in our piece on why Wikipedia dominates LLM citations, and the lesson holds: subdomains work brilliantly when the entities really are separate.

Where it goes wrong is when you put your blog on a subdomain. Your blog is meant to be your brand's voice and your topical authority. If the model treats blog.yourcompany.com as a different entity from yourcompany.com, you are asking it to learn your brand twice. The signal gets cut in half, and the better content (the blog) is the half that loses.

When subdomains still earn their keep in 2026

We are not anti-subdomain. There are five situations where we still recommend them, and we want to be specific, because "it depends" is a cop-out.

1. True multi-region with separate editorial teams

A UK editorial team and a US editorial team producing genuinely different content (not just colour versus color) can each build authority on uk.example.com and us.example.com. The LLMs treat them as distinct entities, which is a feature when the entities really are distinct. This is the Wikipedia pattern. It works because the content actually differs. Translating one article into ten languages does not count, and a subdirectory with proper hreflang would be cleaner.

2. Separately branded products under a parent company

Intuit runs turbotax.intuit.com and quickbooks.intuit.com because those are different products, different buyers, different competitive sets. Each gets its own host-level authority, and that is exactly right. If your products genuinely live separate commercial lives, let the LLMs build separate entities. You want them to.

3. User-generated content you don't want bleeding into your editorial signal

If users publish on your platform (think marketplace stores or community forums), walling that off on a subdomain is the safe move. Spammy or thin user content on a subdomain will not drag your main domain's experience and trust signals down. It is the same logic Google uses to keep blogger.com away from anything it cares about ranking.

4. Apps and tools that need different infrastructure

app.example.com is a perfectly sensible home for software you do not want crawled at all. You can set a separate robots.txt, control caching independently, run different auth. For most of these you do not want citations, so the subdomain is doing you a favour.

5. Developer docs with their own audience

If your developer documentation has its own community, its own search behaviour and its own buying motion separate from your marketing site, docs.example.com can work. Stripe is the obvious example. But watch the trap: if you are a smaller company and your docs subdomain is really just a brochure for engineers, you are bleeding AI citations by keeping it separate. The bigger the brand, the more it can afford a subdomain split. The smaller, the more you want everything under one roof.

When subdomains quietly hurt you in 2026

The situations where we push hard toward a subdirectory:

  • Your blog. It carries your voice and your topical authority. It needs to ride the main domain's signal. Move it to /blog/.
  • Your help centre, if you're a SaaS. Help docs answer exactly the kind of questions ChatGPT loves to summarise. They earn disproportionate AI citations when they sit on the main host. Use /help/ or /support/.
  • Your resources or learning hub. Same story as the blog. The content is built to grow authority, and the authority leaks away on a subdomain.
  • Your case studies. These are pure experience-and-trust signals. They should never live on a separate host. /case-studies/ all day.
  • Your podcast or video hub. Episode pages and transcripts are increasingly cited by AI engines. Keep them under the main host.

This matches what we keep seeing on migrations. On one consolidation we ran in late 2025, a client moved a knowledge base off a subdomain onto the main domain. Eight weeks in, AI Overview citations across the niche tripled. Classic Google traffic from informational queries barely moved. The AI engines responded to the consolidation faster than Google did, which is the opposite of what most teams expect. We unpack the full sequence of one of these moves in our breakdown of site migrations and AI citations in 2026.

The rule we now follow for every new client: anything editorial, anything that builds authority, anything you want cited, goes on the main host. Anything that is product, infrastructure or deliberately walled off can live on a subdomain when there is a real reason.

How to decide for your own site, in five steps

If you are staring at this question for your own site, here is the exact process we walk clients through.

  1. Pick the one host you want the LLMs to associate with your brand. Usually example.com, sometimes the www version. Choose one and commit.
  2. List every piece of content on a non-canonical host. Subdomains, separate domains, anything not on your chosen brand host. Be honest. Most companies have more than they think.
  3. For each item, ask: is this meant to build the brand entity, or is it walled off on purpose? Blog, learning hub, case studies, marketing site are building. App, internal tool, user content, login portal are walled off.
  4. Anything that is "building" and lives on a subdomain is a migration candidate. The cost of moving is visible (redirects, link equity, internal-link cleanup). The cost of staying is invisible: it is the AI citations you never get and never know you missed.
  5. Set a 90-day review. After consolidation, give the LLMs a quarter to re-crawl and re-train. Classic Google updates faster. The deep training signal takes longer.

If you want a hand running this on your own site, this is the sort of thing our team flags inside an SEO audit and strategy engagement. Roughly half the audits we run turn up at least one subdomain that should have been a subdirectory.

The technical bits people get wrong on migration

Decide to move from subdomain to subdirectory and the execution matters more in 2026 than it used to. Here is where we watch migrations go sideways.

1. Reverse proxy or full migration

You can physically rebuild the content under the new path on the main host, or reverse-proxy the subdomain content under a subdirectory path. The reverse proxy is faster, but the content must be served only at the new path. If the old subdomain keeps serving the same pages alongside the new subdirectory, congratulations, you have built the cleanest possible case of duplicate content. Pick one canonical URL and 301 the other.

2. 301 redirects done properly

Every old subdomain URL needs a 301 to its exact new subdirectory equivalent. Not the homepage. Not a category page. The matching page. We have seen teams 301 everything to the new /blog/ index and throw away every page-level signal in the process. Map every URL. There is no shortcut here that does not cost you.

3. Internal links

Update every internal link to point directly at the new subdirectory URL, not at the old subdomain URL that then 301s. Yes, the redirect passes equity. But each hop adds latency and gives crawlers a reason to deprioritise the page, and clean internal links matter for both classic SEO and AI bot crawl budgets. Our wider notes on this sit in our guide to technical SEO strategies.

4. Sitemaps and Search Console

The subdomain has its own Search Console property. Keep it live for at least six months after the migration so you can watch the 301s and catch anything Google is slow to recrawl. Submit a fresh sitemap for the consolidated structure on the main domain's property.

5. AI crawler access

Make sure the robots.txt on the main domain allows the AI crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot) for the consolidated paths. The most common silent failure we see is a team copying a restrictive robots.txt from somewhere and accidentally blocking the very bots that would have cited the freshly consolidated content. We went deep on this in our piece on Cloudflare pay-per-crawl and AI search blocking, and it is worth a read before you touch a single redirect.

What the bot traffic data is screaming at you

One more number to chew on, because it reframes the whole question. Cloudflare's analysis of AI bot crawl-to-click ratios found that training now drives nearly 80% of AI bot activity, up from 72% a year earlier. Anthropic's crawler hit 38,000 pages crawled for every referred visit in July 2025. Perplexity's ratio climbed to 194 crawls per visitor. OpenAI's GPTBot more than doubled its share of AI crawling traffic, from 4.7% to 11.7%, in twelve months.

Why does that matter for the subdomain question? Because if you split content across hosts, each AI crawler has to discover, queue and crawl each host separately. Each subdomain has its own robots.txt, its own crawl pattern, its own trust signal to earn from scratch. The crawl budget you accidentally split across two hosts, when one would have done, is crawl budget you are not spending on the host you actually want cited. The funnel from crawl to citation is already brutal. You cannot afford to leak any of it.

Programmatic and "parasite" plays: the trade-off has flipped

Subdomains used to be the staging ground for a programmatic SEO play. Spin up tools.example.com with 50,000 templated pages, rank fast, keep the risk off the main domain. We cover the full set of trade-offs in our guide to programmatic SEO in 2026, but the short take is this: that hedge made sense in 2022 and it does not now.

In 2026 the AI engines treat the subdomain as a separate, essentially anonymous entity. So your 50,000 programmatic pages are not ranking or getting cited on the strength of your main brand. They are ranking on the strength of a host nobody has heard of. You lose the main reason you would do programmatic in the first place. If you are going to do it, do it on a subdirectory and hold the quality bar high.

Are subdomains bad for SEO in 2026?

No, but they are worse than subdirectories in most cases. Classic Google ranking treats them similarly. AI citation engines (ChatGPT, Gemini, AI Overviews) treat a subdomain as a separate entity, so it does not inherit the trust signal of your main host. For blogs, help centres, learning hubs and case studies, use a subdirectory.

Will moving from subdomain to subdirectory boost AI citations?

In most of the migrations we have run, yes. The lift shows up in AI Overview and ChatGPT citations faster than in classic organic traffic, usually within 8 to 12 weeks of a clean migration with proper 301s and updated internal links. The size of the lift depends on how much authority already sits on the main domain.

What is the difference between a subdomain and a subdirectory?

A subdomain (blog.example.com) is a separate host within your domain's DNS. A subdirectory (example.com/blog) is a path on the same host. To Google's classic algorithm they are roughly equivalent. To LLM citation engines, the subdomain is treated as a separate entity and the subdirectory as part of the main brand.

What to do this week

Five concrete actions, in priority order:

  1. Audit your host structure. List every subdomain you run. Tag each as "building brand authority" or "walled off on purpose".
  2. Run AI citation checks on each subdomain. Search your brand and top topics in ChatGPT, Gemini, Perplexity and AI Overviews. Note which hosts get cited. A subdomain that never gets cited is your signal.
  3. Pick the highest-impact migration candidate. Usually the blog or the knowledge base. Just pick one.
  4. Plan the migration with full URL mapping. 301 every old URL to its exact new path and update internal links to point directly at the new paths.
  5. Watch the 90-day window. Track AI citations weekly and compare against the pre-migration baseline. Most of the lift tends to land between weeks 6 and 12.

The line we will leave you with: "Google treats them the same" was true in 2018 and it is still true for blue-link ranking. But blue links are no longer the only game. The AI engines have their own opinion about what counts as the same site, and they are far stricter than Google ever was. Pick your hosts to suit the engines that actually answer your customers now.

If you have inherited a tangle of subdomains and you are not sure which ones to consolidate, that is squarely the sort of problem we sort out. Whether you want a second opinion on a single subdomain or a full plan for AI search visibility, send us the URL and the goal and tell us what you are trying to fix. We will tell you straight whether it is worth moving.

Keep reading

Want this applied to your own site?

Reading about it is one thing. Start with a search performance audit and we will show you exactly where the wins are.

Book a search audit