All articles
AI Search1 May 2026 · 11 min read

llms.txt: Should You Add One in 2026? What the 300k-Domain Data Actually Shows.

Priyam Goyal

Priyam Goyal

Co-Founder

llms.txt: Should You Add One in 2026? What the 300k-Domain Data Actually Shows.

In a hurry? Summarise this with AI.

Open it in your AI tool of choice for the short version.

On this page

We get asked about llms.txt almost every week now. A founder forwards a LinkedIn post, a dev drops it in Slack, somebody on the marketing team saw a thread claiming it's the new secret to getting cited by ChatGPT. The question is always the same. "Should we add an llms.txt file, and will it get us into AI answers?"

For a while our honest answer was a shrug. "Probably won't hurt." That's a rubbish answer to give a client who's paying for sharper thinking than that. So we did what mechanical engineers turned marketers tend to do: we went and found the data, read the spec, checked what the big platforms actually said, and lined it all up against what we see across our own campaigns.

Here's the short version, and it's less exciting than the hype. As of 2026, llms.txt does not appear to move AI citations for almost anyone. The longer version, with the numbers and the few genuine exceptions, is below.

What is llms.txt?

llms.txt is a markdown file you put at the root of your domain (yoursite.com/llms.txt) that hands large language models a tidy, curated map of your most important content. Think of it as a reading list you've written for the AI, rather than letting it guess from your sitemap and nav bar.

It was proposed by Jeremy Howard at Answer.AI on 3 September 2024. His reasoning was practical. Models have a finite context window, and most web pages are mostly chrome: navigation, scripts, cookie banners, ads. In the original proposal, Howard argued that language models "like to have information in a more concise form" and that it helps to have "a single place where all of the key information can be collated." Hand the model a clean index, the logic goes, and it doesn't have to wade through your footer to find your best page.

It's a sensible idea. Whether the AI platforms actually use it is the entire question, and we'll get there.

What does an llms.txt file look like?

The official spec at llmstxt.org is refreshingly short. The whole thing fits on one page. The format goes:

  • An H1 with the name of the site or project. This is the only required line.
  • An optional blockquote summarising what the site is.
  • Optional prose giving more context.
  • H2 sections, each containing a bullet list of links, with a short description after a colon.
  • A special "Optional" H2 that models are told they can skip when context gets tight.

Howard also proposed a companion idea in the same announcement: serve a clean markdown copy of each page at the same URL with .md appended, so the model gets plain text instead of HTML soup. That part often gets forgotten, and it's arguably the more useful half.

How is llms.txt different from robots.txt?

This trips people up constantly, so let's be blunt. They are not the same standard, and they don't do the same job.

robots.txt is about permission. It tells crawlers what they're allowed to touch. It's a fence.

llms.txt is about recommendation. It tells models what's worth reading. It's a signpost.

Carolyn Shelby at Search Engine Land put it well, calling llms.txt a treasure map rather than a muzzle: it doesn't block crawlers or dictate indexing, it just points at your best content. If you want the older standard explained properly, our robots.txt optimisation guide covers what that file actually controls and where people get it wrong.

Who has actually adopted llms.txt?

On paper the adopter list looks impressive. Anthropic, Stripe, Cloudflare, Mintlify, Vercel and Perplexity all maintain one. Cloudflare went further than most: their developer docs publish a full llms.txt at developers.cloudflare.com with sub-files for individual products, indexing dozens of services in a single clean file.

So the LinkedIn story becomes "everyone's adopting it, get on board." The data tells a more boring story.

SE Ranking crawled close to 300,000 domains and found llms.txt on 10.13% of them. That sounds like a lot until you look at who those domains are. The adopters skew heavily towards devtools, SaaS documentation, and AI-native companies. The long tail of normal business sites, the local services and ecommerce stores and B2B firms most of our clients run, almost never have one.

That mismatch matters more than the headline percentage. The domains that AI tools cite most often, the Reddits and Wikipedias and news sites of the world, largely don't bother with llms.txt at all. If the biggest sources of AI citations aren't using it, the file clearly isn't doing the heavy lifting the hype implies.

What the largest study to date actually found

The most serious public analysis comes from SE Ranking. They took roughly 300,000 domains, segmented by whether they had llms.txt, and ran correlation tests plus an XGBoost machine learning model against AI citation frequency.

The conclusion, published in Search Engine Journal, was direct: "llms.txt doesn't seem to directly impact AI citation frequency. At least not yet." The detail that really lands is buried in the methodology. When the researchers removed llms.txt from their model entirely, prediction accuracy actually improved. The file added noise, not signal.

If that were a single study we'd be cautious about it. It isn't. Trakkr ran an independent analysis on 37,894 AI-cited domains and reached the same place. Their numbers are almost comically flat: sites with llms.txt averaged 6.8 citations, sites without averaged 6.7, with a p-value of 0.81. In plain English, that gap is indistinguishable from random noise. Their verdict was that AI citations correlate with domain authority, content depth and training data exposure, not technical signals like llms.txt.

Two separate teams, two different datasets, the same answer. That's about as close to settled as this stuff gets in 2026.

What the AI platforms themselves say

Here's the part that should end most of the debate. The platforms have told us directly.

Google's John Mueller compared llms.txt to the old keywords meta tag, the one search engines binned over a decade ago because site owners control it and therefore game it. His logic: a bot has already downloaded your real pages, so why trust a separate file you wrote about yourself? It would have to check the file against the actual content to make sure it isn't spam, which makes reading the file pointless. He added that server logs show bots aren't even requesting these files in any meaningful volume. Gary Illyes echoed the position at Google Search Central Live: Google doesn't support llms.txt and has no plans to.

To be fair, not everyone agrees. There's a reasonable counter-argument that llms.txt isn't the new meta keywords, because it points to real URLs where the content "has to exist and deliver when the model gets there," unlike the keywords tag which was pure unverifiable assertion. That's a fair distinction. But even that defence concedes the practical point: adoption is limited and the platforms aren't reading it yet.

So is llms.txt completely pointless?

No, and this is where the nuance lives. Across our portfolio there's one segment where we see anything at all, and it's exactly the use case llms.txt was built for: technical documentation.

SaaS docs, API references, developer tooling. The Stripe and Cloudflare shape of content. Perplexity in particular seems to pull from markdown-mapped doc pages more readily than the other engines, and Mintlify auto-generates llms.txt and llms-full.txt for every customer site, hosting them at the root and at /.well-known/ for good measure. The llms-full.txt variant crams the entire docs site into one file as direct context, which is a genuinely different and more useful thing than the index version.

For everyone else, the picture is flat. A local service business, an ecommerce store, a typical B2B site? AI Mode is going to cite Google Business Profile data, Reddit threads and review sites long before it reads your llms.txt. Semrush analysed 230,000 prompts across ChatGPT, Google AI Mode and Perplexity over 13 weeks and found Reddit and Wikipedia dominating ChatGPT citations, with Reddit appearing in close to 60% of responses at one August peak. None of those citation magnets lean on llms.txt. They earn their place through authority and being everywhere.

If you want to understand the levers that genuinely move AI citations, our guide on how to get your brand into AI answers is the better use of an hour. And our study on where ChatGPT pulls quotes from in your content shows the answer isn't your llms.txt file. It's your first 500 words.

Should you implement llms.txt? Our actual recommendation

This is the bit that disappoints the people selling llms.txt courses. We're going to give you a straight yes-or-no based on what you run, not a vibe.

Add one if

  • You run technical documentation for a SaaS, API or developer tool. The Perplexity behaviour is real for this segment, and Anthropic, Stripe and Cloudflare aren't doing it for no reason.
  • Your platform generates it automatically. If you're on Mintlify, Vercel or a modern docs platform, it's already free and effortless. Leave it on.
  • You want cheap future-proofing. The platforms might start using it. The downside of having one is close to zero.
  • You have structured, evergreen reference content, knowledge bases, glossaries, big comprehensive guides, that you'd want an AI to find easily.

Don't bother if

  • You run a local service business or ecommerce store. The citation sources for your queries live off your domain entirely.
  • You're expecting a quick citation lift. Two independent studies and Google itself say there isn't one.
  • You'd be trading off real content work to do it. An hour improving a pillar page beats ten hours perfecting an llms.txt file, every time.
  • Your dev team has actual technical SEO debt sitting in the backlog. Fix that first. It's not close.

A 5-step checklist if you decide to ship it anyway

Say you've read all of the above, you run docs, and you want one done properly. Here's how to do it without wasting a sprint on it.

  1. List your top 20 highest-value pages. Service pages, pillar guides, glossary entries, the key reference docs. Anything you'd actually want an AI to cite.
  2. Generate clean markdown versions at the same URL with .md appended. Most CMSs handle this with a small function or plugin.
  3. Write the file using the format: H1 site name, a one-line blockquote summary, H2 sections with bullet-list links and short descriptions. Keep descriptions informational, no marketing fluff, most important content first.
  4. Deploy at /llms.txt on the root domain. Optionally mirror it at /.well-known/llms.txt for forward compatibility.
  5. Verify with curl. Running curl -I https://yoursite.com/llms.txt should return a 200 with a text content type.

For most sites that's a 30-minute job. If your dev is quoting you a fortnight, something is off.

What we'd actually spend the time on instead

We track AI citations across our client portfolio, and the things that consistently move them aren't files. They're the unglamorous fundamentals.

  • Pillar content depth. AI tools cite the single most comprehensive page on a topic. Coverage beats word count.
  • Structured early content. Question-format H2s with a tight answer in the first third of the page get pulled disproportionately. Our breakdown of how to get cited in ChatGPT and AI Overviews goes deep on this.
  • Brand mentions across the open web. Wikipedia, industry press, Reddit, niche forums. Models recognise brands by their web-wide footprint, which is exactly why we treat Wikipedia and brand entity work as a citation lever, not a vanity project.
  • Clean schema markup. Article, FAQ, HowTo, Product. Structured data helps every retrieval system, not only Google.
  • Original data and research. AI tools love a defensible statistic. Run a study, publish a number, give them something quotable. It's the approach behind our own original research on AI search visibility.

llms.txt is fine. It's free to deploy, it's interesting as a standards experiment, and it might matter more in 18 months than it does today. But if you're spending real budget on it in 2026 expecting near-term citation gains, the public data isn't on your side and neither is ours. Build the better page first. Add the llms.txt file in the last 30 minutes of the project, not the first.

If you'd rather have a team that knows which of these actually moves the needle for your specific site, that's the whole point of our AI search visibility service. We diagnose the bottleneck before we touch anything. Tell us what you're working on and we'll tell you straight whether llms.txt is worth your time or a distraction from the work that is.

Keep reading

Want this applied to your own site?

Reading about it is one thing. Start with a search performance audit and we will show you exactly where the wins are.

Book a search audit