TL;DR
- Penguin has been part of Google's core algorithm since September 23, 2016, and still penalises sites for over-optimised anchor text patterns in real time.
- In my agency data, exact-match anchor ratios above 8% on commercial keywords correlate with ranking drops inside one to three months.
- AI engines like ChatGPT, Perplexity and Google's AI Overviews do not weight anchor text the same way Google's core ranking does. They use semantic retrieval, so the words around your link matter more than the link's anchor.
- A 2023 Webis study built on Common Crawl pulled anchor text from billions of pages across six snapshots between 2016 and 2021, confirming that anchor text remains a core signal for retrieval systems, just not in the keyword-stuffed way SEOs used to chase.
- The split strategy I use in 2026: keep exact-match under 5% for Google safety, then build descriptive, sentence-length anchors that get embedded cleanly into vector databases for AI citation.
- Branded anchors are not dead. In my client portfolios, sites with 40 to 60% branded anchors get cited more often by ChatGPT and Perplexity than sites with keyword-heavy profiles.
Why I'm writing this in 2026
I run a small link-building agency. About eight months ago, I started noticing something weird in two of my client portfolios.
One client kept ranking on Google but stopped appearing in AI Overviews. Another started showing up everywhere in Perplexity but lost three keywords on Google in a single month. Same agency. Same outreach process. Different anchor text profiles.
That is what kicked off this post. Most of the "anchor text in 2026" content I read this year is still arguing about whether exact-match should be 3% or 8%. Nobody is asking the actual question, which is: what does an anchor text strategy look like when half your traffic comes from systems that index links semantically, not algorithmically?
Here is what I have learned from running link-building campaigns across roughly 40 client accounts in 2025 and 2026, plus what Google's own documentation actually says.
The Penguin rules have not changed. They just stopped being announced.
A lot of newer SEOs forget that Penguin still exists. They think it was a 2012 thing that got retired.
It did not. Google Penguin launched on April 24, 2012 and affected about 3.1% of English search queries. On September 23, 2016, Google folded it into the core algorithm and made it update in real time. That means Penguin does not get "refreshed" any more. It is just always running, evaluating links and anchor text every time Google re-crawls your backlink profile.
What Google actually says about manipulative anchor text is published in plain English in its spam policies. The official Google Search spam policies list "links with optimized anchor text in articles, guest posts, or press releases distributed on other sites" as link spam. They also call out "keyword-rich, hidden, or low-quality links embedded in widgets" and "widely distributed links in the footers or templates of various sites."
Notice the specificity. Google is not banning guest posts. It is banning guest posts with optimised anchor text. The anchor is the trigger.
What I see trigger Penguin in 2026
From client audits I have run this year, these are the patterns that line up with traffic drops:
- Exact-match anchor ratios above 8% to a single commercial URL
- 15+ guest posts using slight variations of the same money keyword as anchor ("best widgets for X", "top widgets for X", "widgets for X reviewed")
- Footer links across multiple sites all pointing to the same anchor
- Multiple anchors from the same referring domain class (e.g. ten different .info directories) all using product keywords
The sites I have helped recover from this kind of pattern took between two and seven months. I wrote a longer breakdown on the link reclamation process I use after a Penguin hit, if you want the full workflow.
The anchor text ratios I actually use
I hate giving universal ratios because they get repeated as gospel. But people ask for numbers, so here is what I aim for in my own white hat link building work in 2026:
- Branded anchors: 40 to 60%. Brand name only, or brand plus generic descriptor ("Acme", "Acme team", "Acme's guide").
- Naked URLs and generic anchors: 15 to 25%. "acme.com", "this article", "here" (sparingly), "the full study".
- Partial-match and contextual anchors: 15 to 25%. Phrases that include the keyword inside a longer descriptive sentence, like "how Acme handles widget compliance".
- Exact-match: 2 to 5%. Maximum. And only on links from very high authority sources where the anchor reads naturally.
- Co-occurrence anchors: as many as you can earn. These are links where the anchor is a brand name but the surrounding paragraph contains the keyword. More on these below, because they are the unlock for AI engines.
The ratios I just listed are roughly what I see on the natural link profiles of sites that have never done outreach. Genuinely organic profiles tend to lean even more heavily branded, sometimes 70% or more.
This is also why I keep telling clients that unlinked brand mentions matter more than backlinks in some contexts now. A brand mention with no link reads as more natural to both Google and AI retrieval than a keyword-rich link.
What "descriptive anchor" actually means to Google
Google's official documentation on links is more useful than most third-party guides on this topic. Worth reading the original.
The key line is: "Try reading only the anchor text (out of context) and check if it's specific enough to make sense by itself." Google explicitly warns against using "click here", "read more", "website" and "article" as anchors. Not because they hurt rankings directly, but because they fail the standalone-meaning test.
What Google wants is descriptive. "List of cheese types" is the example Google itself uses. That phrase tells you what the destination page is about, without being a stuffed keyword like "best cheese types list buy cheese online".
This distinction matters more than ever because AI engines are reading anchor text the same way.
How AI engines actually use anchor text
This is the part nobody talks about properly. Most SEO advice treats AI search as if it works like Google's core algorithm with extra hallucination. It does not.
Most large AI engines, including ChatGPT, Perplexity and Google's AI Overviews, use some form of retrieval-augmented generation. RAG works by converting documents into numerical embeddings stored in vector databases, then retrieving the most semantically relevant chunks when a user asks a question.
The retrieval step does not look at anchor text ratios. It looks at meaning. So when a system is deciding which page to cite for a query like "how should small SaaS teams structure their pricing pages", it is comparing the semantic meaning of that query to the semantic meaning of paragraphs in its index.
Anchor text still matters in this system. Just differently. Here is how:
- Anchors function as semantic labels attached to your URL across the web
- Multiple descriptive anchors covering different angles increase the topical breadth your URL is associated with
- Short keyword anchors compress the topical meaning of your page in a way that hurts semantic match
- Sentence-length anchors with natural language often map to more user queries
In practical terms, the link "Acme's breakdown of how SaaS pricing pages convert better with annual-first ordering" is more useful to an AI engine than the link "SaaS pricing pages". The first one matches more queries semantically. The second one matches fewer, and it pattern-matches as commercial in Google's eyes.
I cover this dynamic in more depth in my piece on how to get your brand cited in AI answers.
The Common Crawl evidence base
If you want to understand how AI models learned what anchor text means, you need to know about Common Crawl.
Common Crawl is an open repository of web crawl data containing petabytes of pages collected since 2008. Almost every major LLM you have heard of was trained, at least in part, on Common Crawl data.
A team of researchers from Webis built a dataset called MS MARCO Anchor Texts, which enriched 4.8 million documents with anchor text extracted from six Common Crawl snapshots between 2016 and 2021. Each snapshot covered between 1.7 and 3.4 billion documents. That is the closest thing we have to a public ground truth for how the open web actually uses anchor text.
The practical implication: every major AI model has been trained on a representation of the web in which anchor text is one of the labels attached to URLs. When ChatGPT or Claude or Gemini decides which sources to recall for a query, the anchor text history of those URLs is part of how the meaning got encoded.
This is why building contextual mention strategies through digital PR often does more for AI visibility than aggressive exact-match link campaigns. Press mentions tend to use descriptive sentence-length anchors. Those anchors get encoded across multiple Common Crawl snapshots over years. The brand-to-topic association gets baked into the underlying model.
The split: Google wants safety, AI wants signal
Here is the simplest way I think about this in 2026:
- Google rewards a natural-looking profile. It penalises patterns that look manipulated. Your anchor text strategy needs to pass an audit done by a junior at Google's spam team.
- AI engines reward semantic clarity. They want to know what your page is about and what queries it answers. Your anchor text needs to read like a human description, not a keyword.
The good news is that the two strategies do not conflict. They actually pull in the same direction, away from exact-match and toward descriptive, branded, contextual anchors.
The bad news is that the SEO industry spent 15 years training people to do the opposite. A lot of agencies still measure success by how many exact-match anchors they can place. I have lost count of how many client audits I have done where the previous agency's main deliverable was "50 contextual backlinks", and every single one had the same three keywords as anchor.
When I take those clients on, the first thing I do is run a link reclamation pass to either dilute or remove the worst patterns. Sometimes that means asking for anchor text changes. Sometimes it means new placements with branded anchors to push the ratio down.
How to audit your own anchor text profile in 2026
This is what I do for every new client. You can do most of it yourself in a few hours.
- Export every backlink with anchor text from Ahrefs, Semrush or Majestic. Use referring domain as the unit, not individual links, because one site sending 200 sitewide links should count once.
- Tag each anchor as branded, naked URL, generic, partial match or exact match. A spreadsheet works. You will get a feel for the borderline cases after about 50 rows.
- Calculate ratios per landing page, not just sitewide. The danger is concentration on commercial URLs. Your homepage can have 80% branded and your services page can still be in trouble.
- Flag any landing page where exact-match is above 8% or any single anchor variant repeats more than 5 times across different domains.
- Look for unnatural clusters. Multiple links from low-quality domains, all dated within the same month, all with similar anchors. That is the pattern Penguin was built to catch.
- Check semantic spread. For your top 5 commercial pages, are the anchors describing different aspects of what the page covers, or are they all rewording the same keyword? If they are all rewording the same keyword, AI engines are getting a thin signal.
For the manual side of this, the free SEO audit tool I built will surface the major patterns, but the anchor analysis itself is best done by eye. There are no shortcuts I trust here.
Real-world example from my agency data
One anonymised case from earlier this year. A B2B SaaS client came to me with strong Google rankings but almost no AI Overview citations. Their anchor profile looked like this:
- 18% exact-match on the term "workflow automation software"
- 22% partial match using slight variations
- 35% branded
- 25% other
The Google rankings were holding because the rest of their on-page and content was strong enough to mask the over-optimisation. But every AI engine I tested was citing competitors who had less authority but more topically descriptive anchors across their backlink profile.
We ran a 90-day campaign that did three things:
- Reached out to existing referring domains and asked for anchor text changes on the worst exact-match clusters
- Built 14 new placements through scaled guest posting with quality controls, using long descriptive anchors that mentioned the brand plus a specific use case
- Added 22 unlinked brand mentions via digital PR, each one written into a paragraph that semantically described the product category
At 90 days, ChatGPT cited the client for 9 of the 14 target queries we were tracking. At Google, exact-match dropped from 18% to 7%. The rankings held. Pipeline coming in from AI traffic was about 40% of total inbound by month four.
Not every client gets results that clean. Some take longer. Some take more outreach budget. But the pattern repeats. I have written up the framework I use for running these campaigns if you want to see the full process.
Co-occurrence: the anchor strategy that works for both Google and AI
This is the section I want most SEOs in 2026 to read.
A co-occurrence anchor is when the visible link text is branded or generic, but the keyword you want to associate with the URL appears in the surrounding sentence or paragraph. For example:
"Acme published a detailed breakdown of how SaaS pricing pages should be structured to convert annual-first."
The anchor is "Acme". That is a branded anchor. Google sees a clean, natural mention. Penguin has nothing to flag.
But the surrounding sentence contains "SaaS pricing pages" and "convert annual-first". When that paragraph gets embedded into a vector database by an AI engine, the URL gets associated with both of those concepts semantically. The retrieval system will surface that URL when a user asks about either topic.
This is the unlock. You get the safety of branded anchors with the semantic richness that AI engines reward. It is the closest thing to a free lunch I have found in modern link building.
The only way to earn co-occurrence anchors at scale is to write things worth quoting. Original data, strong opinions, case studies with numbers. That is why I keep pushing clients toward AI search citation strategies that produce quotable content, not just toward link volume.
What about parasite SEO and dead-site anchors?
Quick note because I get this question every week.
There has been a wave of parasite SEO and dead-site reputation abuse in 2025 and 2026, where black hat operators take over expired domains and use their existing backlink profile to rank for new keywords. The anchor text inherited from the previous site does carry weight, but it also carries risk.
Google has been more aggressive about catching this. I wrote a full piece on the parasite SEO and dead-site abuse pattern in 2026 for anyone considering it. Short version: do not.
The anchor text on inherited links was earned for a different domain doing different things. When Google notices the topic shift, the link equity gets discounted or the site gets manually actioned. I have seen four cases in the last six months. None of them were saveable.
How knowledge graphs change the anchor equation
One more layer that almost nobody is talking about.
Google and the major AI engines maintain knowledge graphs that map entities to concepts and to each other. When your brand has a clearly defined entity in those graphs, anchor text plays a different role. It is not the primary signal of what your site is about. It is reinforcement.
For brands without a strong entity presence, anchor text is doing more of the work. That is when over-optimisation looks worst, because the anchor profile is the main story Google has about you.
I cover this in detail in my piece on knowledge graphs and entity optimisation for AI search. The short version: the more clearly defined your brand entity is across structured data, mentions and citations, the less weight any individual anchor carries, and the safer you are.
What to do this week
If you take one thing from this post, it is that the anchor text strategy that protects you from Google in 2026 is also the one that gets you cited by AI engines. The two are no longer in tension.
Here is what I would do if I were taking over a client account tomorrow:
- Pull every backlink and tag the anchors. Use a spreadsheet. Branded, naked, generic, partial, exact. One row per referring domain.
- Calculate exact-match percentage per landing page. Flag anything above 8% on commercial URLs.
- Look for repeat anchors across different domains. Three or more domains sending the same partial-match anchor is your first red flag.
- Audit how your brand is being described in mentions. Are the sentences around your links semantically rich? Or are they just keyword wrappers?
- Stop chasing exact-match anchors. Brief outreach partners with two or three descriptive anchor options that include the brand name.
- Build co-occurrence into every placement. The anchor is your brand. The surrounding paragraph describes the topic. Both Google and AI engines reward this.
- Run one digital PR campaign that produces a quotable asset. Original data works best. The anchors will follow.
If you want to see how this plays out across real client accounts, my case studies cover several brands where this anchor text shift was the main driver of recovery or growth. And if you want a second opinion on what your anchor profile looks like right now, my team runs link audits and outreach campaigns as part of our SEO services.
The SEO industry spent a decade arguing about whether exact-match should be 3% or 5% or 8%. In 2026, the more important question is whether your anchors mean anything to a machine that does not count them. Mine do. I hope yours will too.


