All articles
AI & SEO27 March 2026 · 12 min read

I Tested 5 SEO AI Agents. Most of Them Are Just Wrappers Around ChatGPT.

Priyam Goyal

Priyam Goyal

Co-Founder

I Tested 5 SEO AI Agents. Most of Them Are Just Wrappers Around ChatGPT.

In a hurry? Summarise this with AI.

Open it in your AI tool of choice for the short version.

On this page

Every year somebody tells us a tool is about to replace 80% of what an SEO does. First it was automated link building. Then AI content writers. Now it is "SEO AI agents" that supposedly run your whole workflow, from audit to execution, while you watch the rankings climb and sip something cold.

We wanted it to be true. Genuinely. Our founders are mechanical engineers who fell into marketing, and nothing makes us happier than a machine doing the boring bits. So we blocked out two weeks, signed up for five SEO AI agents, and pointed them at real client sites. Not demo data. Not a sandbox. Actual businesses with actual problems and actual revenue on the line.

Here is what we found, including the one that genuinely impressed us and the two that should be embarrassed to charge money.

What is an SEO AI agent, actually?

An AI agent is meant to be different from an AI tool. A tool takes an input and hands you an output. An agent takes a goal and works out the steps itself: it plans, acts, checks its own work, and adjusts. That is the pitch.

The reality is messier, and the marketing is doing a lot of heavy lifting. Most products sold as "SEO AI agents" right now fall into one of three buckets:

  1. ChatGPT wrappers. A pretty interface around the same API calls you could make yourself. They look agentic. They are not.
  2. Workflow automators. They chain existing tools together (crawlers, keyword APIs, content generators) into a sequence. Useful, but the "agency" is a script someone wrote, not the model reasoning.
  3. Genuine agents. They can browse your site, find issues, prioritise them, and suggest or even execute fixes with minimal hand-holding. These exist. They are rare, and they are early.

This distinction matters because the gap between the marketing and the engineering is wide right now. A February 2026 paper from researchers including Princeton's Sayash Kapoor, "Towards a Science of AI Agent Reliability," evaluated 14 agentic models and found that "many agents still continue to fail in practice," and that recent capability gains "have only yielded small improvements in reliability." Translation: the demos got better faster than the dependability did.

How we tested them (so you can trust the verdicts)

We are not going to name the five products, and that is deliberate. This space moves so fast that a tool that is rubbish today might ship a genuinely good update next month, and we would rather not torch a company over a snapshot. So we will describe what each category did instead.

Every agent got the same brief on three live client sites:

  • Run a technical crawl and surface the real issues, ranked by impact.
  • Find content gaps against the pages currently ranking, not just word count.
  • Produce something we could actually ship, not a generic checklist.

We graded on whether a competent SEO could take the output and use it without redoing the work from scratch. Low bar. Some of them tripped over it.

The wrapper agents (2 of 5): save your money

Two of the five were, functionally, GPT-class models with a system prompt that said "you are an SEO expert." They could answer SEO questions. They could draft meta descriptions and spit out keyword ideas. So can the free version of ChatGPT, which is the problem.

When we asked one to audit a client site, it asked us to paste in the HTML. It could not browse the site itself. When we asked for a technical audit, we got a generic checklist that would apply to literally any website on earth. No crawl. No specific issues. No data. Just a confident list of things one should generally do.

Our verdict: open ChatGPT, write your own prompts, keep the subscription fee. You are paying for branding and a logo.

The workflow automators (2 of 5): great hands, no brain

Two were more capable. They connected to Google Search Console, pulled real data, and ran a proper multi-step analysis. One produced a decent technical audit from genuine crawl data. The other focused on content gaps and keyword opportunities, and was reasonable at it.

These saved real time. A job that takes one of our team about two hours (pulling GSC data, cross-referencing keyword volumes, spotting gaps) took roughly 15 minutes with the agent doing the data wrangling. We will happily take that trade.

But the recommendations were skin deep. "Your page about X has thin content, consider expanding it." Right, expand it with what? What is the search intent? What are the ranking pages covering that this one is not? The agent could not answer without a lot of prompting, at which point we have done the thinking and it has done the typing. That is a fast intern, not a strategist. It is also where problems like keyword cannibalisation across AI Overviews sail straight past, because the tool sees pages in isolation, not as a system.

Our verdict: brilliant for data gathering and a first pass. Nowhere near ready to replace strategy.

The genuine agent (1 of 5): impressive, and a little dangerous

One tool actually made us sit up. It browsed a site, found specific technical issues (broken canonicals, orphaned pages, missing schema), ranked them by impact, and generated implementation-ready fixes. Not "consider adding schema." Actual code snippets and named file changes.

It also compared pages against the top-ranking results and surfaced topical gaps, not just word-count deltas. The kind of analysis we lean on heavily when we plan technical SEO work for a client.

Was it perfect? No. It confidently recommended changes that would have broken the site's navigation. It missed context any human would clock in 30 seconds. And its prioritisation ignored business goals entirely, ranking issues purely by technical severity, which is not how you decide what to fix first when budget is finite.

Our verdict: a genuinely useful research assistant. Hand it the keys and let it execute unreviewed, and it will eventually drive into a wall with total confidence.

The capability is real. The reliability is not (yet).

Here is the bit that explains everything above. AI agents are getting more capable astonishingly fast, while staying stubbornly unreliable. Those are two different things, and people keep conflating them.

On the capability side, the trend is genuinely steep. METR's research found that the length of task a frontier agent can complete autonomously, measured against how long a human expert would take, has been doubling roughly every seven months for six years. Impressive. But read the small print: that is measured at a 50% success rate, and METR notes the best models "can only reliably complete tasks of up to a few minutes long." Half the time it nails an hour-long task. The other half, it doesn't. You would not run a client account on a coin flip.

The reliability gap is just as well documented. Stanford's 2026 AI Index, as reported by IEEE Spectrum, points to a benchmark called ClockBench, where even the best-performing model had "just 50-50 odds" of correctly reading an analog clock. The same models scoring superhuman on hard reasoning tasks can fumble something a six-year-old does over breakfast. That is the texture of working with agents today: superhuman in narrow lanes, weirdly unreliable the moment you step outside them.

So when an SEO agent crawls a site beautifully and then recommends a fix that breaks the nav, it is not a bug you can patch out. It is the current state of the technology.

What SEO AI agents are genuinely good at right now

We are not anti-AI. We use it every day. After two weeks of testing, here is our honest read on where these tools add real value today.

They earn their keep at:

  • Data collection and aggregation, pulling multiple sources into one view.
  • Pattern recognition across big datasets, finding the needle when there are a million strands of hay.
  • First drafts of repetitive deliverables: meta descriptions, alt text, basic audit reports.
  • Keyword work at scale, especially clustering and intent mapping.
  • Spotting technical issues that follow known patterns.

They fall over at:

  • Strategic prioritisation, as in what actually matters for this business this quarter.
  • Context, as in why a page exists, who it serves, and what the commercial goal is.
  • Creative problem solving when the answer is not somewhere in the training data.
  • Edge cases, and every real website is a museum of edge cases.
  • Link building, where no agent can replace a relationship, a pitch, and a human deciding your story is worth covering.

That last one bears repeating because it is the most over-promised. An agent can find link prospects all day. It cannot make an editor trust you. The graft of earning a placement is still a human job, which is exactly why our white-label link building service runs on outreach and relationships rather than a clever script.

The risk nobody selling these tools wants to mention

Here is what actually worries us, and it is not the robots taking our jobs. It is agencies handing a junior an AI agent and shipping the output to a client as if it were finished work. It is not finished. It is a confident first draft that sounds finished.

An agent that confidently recommends schema on pages where it would breach Google's structured data rules is worse than no recommendation at all. With no recommendation, nothing breaks. With a wrong one delivered at speed and scale, you break things at speed and scale. The reliability paper above lists exactly this pattern of failure: agents that act decisively and incorrectly.

Google has been consistent here for years, and nothing about agents changes it. Its guidance on AI-generated content warns that using AI "to generate many pages without adding value for users may violate Google's spam policy on scaled content abuse," and tells you to "focus on accuracy, quality, and relevance, especially when automatically generating the content." Method does not matter. Quality does. An agent producing low-quality recommendations at scale just gives you low-quality results at scale, faster. We dug into where that line actually sits in our piece on whether Google really cares about AI content detection.

Worth answering directly, because two separate hype waves are colliding: agents that do SEO, and AI search surfaces you have to optimise for. People keep muddling them.

On the second one, Google could not be clearer. Its documentation on AI features in Search states plainly that "the best practices for SEO remain relevant for AI features in Google Search (such as AI Overviews and AI Mode)," and that "there are no additional requirements to appear in AI Overviews or AI Mode, nor other special optimizations necessary." No secret markup. No AI text file. No fee. The fundamentals still win.

That does not mean the playbook is identical. Getting cited by an LLM is its own discipline, which is why we treat AI search visibility as a distinct service and have written a full guide on getting your brand into AI answers and another on earning citations in ChatGPT and AI Overviews. But the foundation is the same SEO you already know. No agent shortcuts that.

How our team actually uses AI day to day

People expect us to either dismiss AI or worship it. We do neither. Our split is roughly 70% human, 30% AI-assisted, and very deliberately not the other way around.

Here is where AI sits in a typical week for us:

  1. Clustering hundreds of keywords into themes, so a human can spend the time on intent rather than spreadsheets.
  2. Drafting content briefs we then rewrite, because the first draft is scaffolding, not the building.
  3. Summarising competitor content for gap analysis at a speed no human can match.
  4. Knocking out first drafts of genuinely repetitive pages, like location and product templates.
  5. Crunching large GSC exports to spot trends worth a closer look.

And here is where we keep AI well away from the steering wheel:

  • Client strategy, because every business has a different bottleneck and the agent cannot see it.
  • Outreach and link building, where a relationship beats a generated email every time.
  • Technical implementation, because of the edge-case museum we mentioned.
  • Final content review, where AI still misses tone, nuance, and brand voice.
  • Anything in a sensitive niche, like medical or financial advice, without expert human review.

If you are an experienced SEO, these tools are a force multiplier. If you are new, they will lead you confidently off a cliff and you will not know until the traffic drops. That asymmetry is the whole game right now.

Where this is heading

We think SEO AI agents become genuinely, reliably useful in roughly 12 to 18 months, and the capability curve from METR suggests that timeline is not wishful thinking. The agents that browse, analyse and suggest are getting better at context and edge cases with every release. The benchmark gains on autonomous computer use that Stanford and IEEE flagged are not fake.

But as it stands in 2026, most of what is marketed as "AI agents for SEO" is the marketing arriving before the engineering. The reliability research is unambiguous: capability is sprinting, dependability is jogging, and the gap is exactly where the expensive mistakes live.

The best use of your budget today is still a knowledgeable human who treats AI as one sharp tool in a full kit, rather than the kit itself. If you would rather have that human run the testing, the strategy and the boring-but-load-bearing execution for you, that is literally what our SEO team does. Tell us about your site and we will tell you, honestly, where AI helps and where it would quietly cost you. You can get in touch here.

We still want the robots to take the boring bits. They are just not ready to be left alone with your rankings yet. Give it a year.

Keep reading

Want this applied to your own site?

Reading about it is one thing. Start with a search performance audit and we will show you exactly where the wins are.

Book a search audit