AI Engines Now Cite Images in Answers. Schema.org Quietly Replaced Alt Text as the Strongest Signal.
Jhonty Barreto
Founder

In a hurry? Summarise this with AI.
Open it in your AI tool of choice for the short version.
On this page
- AI engines now drop images straight into answers
- What is an AI image citation, anyway?
- What we tested, and what actually moved
- How the three engines behave differently
- Why alt text alone stopped being enough
- What ranks in Google Images vs what gets cited by AI
- What a proper ImageObject block looks like
- A 12-step checklist for AI-image-optimised content
- Don't ignore the traffic maths underneath all this
- Where this is heading, and what we'd do now
A few weeks ago one of our team asked ChatGPT a fairly dull question about commercial chillers for a client project. The answer came back with a thumbnail tucked next to one of the cited paragraphs. We clicked through out of habit.
The page wasn't number one in Google. It wasn't even on page one of Google Images. But it had one thing the better-ranked pages didn't: a clean, fully populated ImageObject block describing exactly what was in the photo, who shot it, and what it showed.
That was the moment a few of our old assumptions about image SEO went in the bin. Alt text still has a job. File names still nudge things. But the signal that shoved that image into an AI answer wasn't either of those. It was structured data. So let's talk about AI image citations, why schema is now doing the heavy lifting, and what that means for the way you mark up pictures in 2026.
AI engines now drop images straight into answers
This isn't a niche edge case anymore. Three shifts in the last year and a half turned image markup from a tidy-up task into a ranking surface in its own right.
ChatGPT search arrived first. According to Wikipedia's record of the rollout, OpenAI announced ChatGPT search on 31 October 2024 and made it free for everyone, no account needed, by early February 2025. From the start it leaned visual, surfacing at-a-glance cards and thumbnails inside answers rather than wall-of-text replies.
Google moved next, and harder. Its September 2025 Search update brought visual results to AI Mode on 30 September 2025. Google described a technique it calls "visual search fan-out", which it says "allows us to have a deeper understanding of precisely what's in an image". The system runs background queries on secondary objects in a picture, then blends Lens, Image Search, and "Gemini 2.5's advanced multimodal and language capabilities" into a single visual answer. That last bit matters more than it sounds, and we'll come back to it.
Perplexity rounds out the set. In our own testing it now shows a strip of source-linked images alongside most informational answers, treating pictures as a parallel citation track rather than decoration.
So when someone asks "what does a rooftop chiller actually look like" or "how do I bleed a radiator", the reply isn't only words now. There's an image. That image came from somebody's website. And that website just collected a citation, a thumbnail, and often a click. That citation is the new prize. The whole game is working out how to win it.
What is an AI image citation, anyway?
Quick definition before we go deeper. An AI image citation is when a generative engine like ChatGPT, Google AI Mode, or Perplexity pulls an image from your page and shows it inside its answer, attributed back to your domain. It's the visual cousin of being cited in text, and it follows different rules.
The key difference is how the engine decides what the image shows. A human looks at a photo and just knows. A model needs machine-readable confirmation, and the cleaner that confirmation, the more confidently it will surface your image instead of a rival's. This is the same trust mechanism that drives how brands get pulled into AI answers for text, just applied to pictures.
What we tested, and what actually moved
We're practitioners, so we wanted numbers rather than vibes. Over three months we took 14 pages from client sites and split the images on them into two groups.
Group A had textbook traditional image SEO: descriptive file names, full alt text, visible captions, image sitemap inclusion, decent inbound links.
Group B had everything in Group A plus a complete ImageObject JSON-LD block on each embedded image, with contentUrl, caption, creator, license, description, and representativeOfPage all populated.
We then tracked how often each image showed up in ChatGPT, Perplexity, and Google AI Mode answers across 60 prompts mapped to those pages. Group A images appeared 11 times. Group B images appeared 38 times. Same content, same rough authority, same alt text. The structured data was the variable that moved.
We won't dress this up as a peer-reviewed study. The sample is small and it's our own data. But the gap was wide and consistent enough that we now treat ImageObject markup as the default on any page where the image carries real meaning, not as a nice-to-have we get round to later.
How the three engines behave differently
They don't all cite images the same way, and pretending they do will waste your time. Here's the pattern we keep seeing.
ChatGPT search usually embeds one or two thumbnails on informational queries, and it pulls them from the same domains it's already citing for text. If your text is being cited and your image markup is clean, you tend to get the visual slot too. If your markup is thin, it quietly grabs an image from somewhere else. If you're working on text citations in parallel, our ChatGPT search optimisation playbook covers the groundwork that gets your domain into the citation pool in the first place.
Perplexity shows the widest image strip, often six to eight pictures, each tied to a specific source. You can earn a spot in that strip without ranking for the text answer at all. In our testing the description and caption fields seem to feed its image scoring most directly.
Google AI Mode is the aggressive one. That September 2025 visual fan-out means Gemini runs queries on the visual elements themselves, so images with rich metadata get pulled in even when the page text is only loosely on-topic. That's a genuine break from classic Google Images, where the surrounding page did most of the explaining. If you want the wider context on earning AI Mode and AI Overview citations, our breakdown of AI Overview citation rates is a good companion read.
The thread tying all three together: the engine needs to be told, in a format it trusts, what the image actually shows. Alt text hands it a fragment. Structured data hands it the whole picture, literally.
Why alt text alone stopped being enough
Let's be fair to alt text. It was built for accessibility, screen readers read it aloud, and that job matters. It's also still genuinely underused. The 2024 Web Almanac media chapter found only 55% of images on the web carry a non-blank alt attribute, meaning 45% have nothing. The median mobile page packs in 13 images, and 99.9% of pages request at least one image resource. So most sites are leaving a basic signal on the floor before we even get to schema.
Alt text was also handy for old-school Google Images, because the algorithm could pair it with the file name and nearby text to guess the subject. Google still rates it highly. Google's own image SEO guide calls alt text "the most important attribute when it comes to providing more metadata for an image".
So why are we saying it's been overtaken for AI citations? Two reasons.
- Alt text is short by design. Best practice keeps it under roughly 125 characters. That's nowhere near enough for a model to confidently know it's looking at a 2024 water-cooled commercial chiller rather than a 2018 residential split unit. The
descriptionfield in structured data has no such limit. It can carry a full paragraph of named entities, measurements, and context. - Alt text isn't typed or linked. It's a plain string floating in the page. Structured data is typed and connected. An
ImageObjectcan declare itscreator, point to alicense, and link back to anOrganization. AI engines treat that web of relationships as far higher confidence than free text.
That second point is where the data gets interesting. The 2024 Web Almanac structured data chapter found schema.org dominates the JSON-LD landscape with over 20 million instances, and that JSON-LD adoption climbed from 34% of pages in 2022 to 41% in 2024. Crucially, it notes that ImageObject is "frequently connected to Organization and WebPage entities". That connection, image to brand to page, is exactly the chain an AI engine walks when it decides whether to trust and attribute a picture. Alt text describes one image to a screen reader. Structured data tells the engine where that image sits inside your site, your business, and the broader entity knowledge graph. Different jobs entirely.
What ranks in Google Images vs what gets cited by AI
This is the comparison that trips people up. We've watched the same image rank beautifully in Google Images and never once get cited by ChatGPT, and we've watched the reverse. The signals overlap, but they're weighted differently.
Google Images still rewards:
- Descriptive file names like
commercial-chiller-rooftop-installation.webp - Concise, accurate alt text
- Page authority and inbound links
- Image sitemap inclusion
- Page speed and Core Web Vitals
- Modern formats and sensible dimensions (WebP, AVIF)
AI engines reward something different:
ImageObjectJSON-LD withcontentUrl,caption, and a meatydescriptionpopulated- A
creatororauthorlink to a definedOrganizationorPerson - A
licenseURL, even if the licence is "all rights reserved" - Strong on-page semantic context, especially the heading directly above the image
representativeOfPage: trueon the image when it genuinely is the page's hero- Connections to other schema on the page (Article, Product, HowTo, FAQ)
Our honest take: if you can only optimise one signal this year, make it structured data. The traditional signals matter at the margin and you shouldn't ignore them, but schema is what's pulling images into AI answers. This is also why visual-first platforms reward structured, well-described imagery, a pattern we dig into for shops in our guide to Pinterest SEO and visual search for ecommerce.
What a proper ImageObject block looks like
Enough theory. Here's the kind of block we now use as a default on client pages where the image is genuinely informative. Schema.org defines ImageObject simply as "an image file", and it inherits a generous set of properties from MediaObject and CreativeWork, including caption, contentUrl, creator, license, description, and representativeOfPage.
{
"@context": "https://schema.org",
"@type": "ImageObject",
"contentUrl": "https://example.com/images/rooftop-chiller-install.webp",
"name": "Rooftop chiller installation, London",
"caption": "Technicians securing a 200kW water-cooled commercial chiller to a rooftop plant deck",
"description": "A photograph of two HVAC technicians installing a 200 kilowatt water-cooled commercial chiller on a rooftop plant deck in central London. Copper refrigerant lines run to a condenser tower visible in the background.",
"width": 1600,
"height": 1067,
"encodingFormat": "image/webp",
"datePublished": "2026-04-18",
"creator": {
"@type": "Organization",
"name": "Example Refrigeration Ltd",
"url": "https://example.com"
},
"copyrightNotice": "© 2026 Example Refrigeration Ltd",
"license": "https://example.com/image-license",
"representativeOfPage": false
}
Two things to flag. The description is doing work alt text simply can't, a full sentence with named entities, a measurement, and a location, so the engine knows what's in the frame without guessing. And the creator is an Organization with a URL, which gives the engine a clean way to attribute the image and tie it to your brand entity.
Google has a specific view on the licensing side of this. Its image metadata documentation requires contentUrl plus at least one of creator, creditText, copyrightNotice, or license, and notes you "must include the license property for your image to be eligible to be shown with the Licensable badge". Filling those in is cheap insurance: it satisfies Google's badge requirements and hands AI engines the attribution data they crave in one go.
You can push this further by wrapping the image in an Article, Product, or HowTo schema and referencing it via the image property. That nests the picture inside richer context, and in our testing it reliably produces more citations than a lone ImageObject floating on its own.
A 12-step checklist for AI-image-optimised content
Here's the sequence we actually run on new pages. Most of it takes minutes per image once the schema is templated in your CMS.
- Decide which images are genuinely informative. A stock photo of a grinning model doesn't need any of this.
- Save each informative image with a descriptive file name and serve it as WebP or AVIF.
- Write accurate alt text under 125 characters. Keep it, it's still for screen readers.
- Add a visible caption beneath the image. AI engines often lift captions verbatim.
- Place the image directly under the heading it relates to. Semantic proximity matters.
- Add a complete
ImageObjectJSON-LD block withcontentUrl,name,caption,description,creator,license, anddatePublishedfilled in. - Reference the image from the parent
Article,Product, orHowToschema via theimageproperty. - Mark the genuine hero image as
representativeOfPage: true. Don't flag every image this way. - Make sure the image URL is crawlable and indexable. Engines can't cite what they can't fetch.
- Add the image to your XML image sitemap. It still helps Google Images and costs nothing.
- Validate the markup with Google's Rich Results Test and the Schema.org validator.
- Track citations in ChatGPT, Perplexity, and AI Mode for your priority queries, then iterate on what gets pulled.
Don't ignore the traffic maths underneath all this
Here's the uncomfortable bit. Earning the image slot matters precisely because the click economics are shifting against publishers. Pew Research Center tracked the browsing of 900 US adults across 68,879 Google searches in March 2025. When an AI summary appeared, people clicked a traditional search result in just 8% of visits, against 15% when no summary showed. Clicks on links inside the summary itself? 1%.
Read that the right way round. Fewer organic clicks means the visibility you do get inside the answer, including a cited image with your brand attached, is worth more than it used to be. A thumbnail with your domain on it is brand exposure even when nobody clicks. If you've watched your own clicks soften, our take on getting cited in ChatGPT and AI Overviews walks through the defensive plays.
Where this is heading, and what we'd do now
We think the gap between traditional image SEO and AI image citations widens through the rest of 2026. Engines train on structured data because it's the cleanest signal they have, so pages that hand over typed, linked metadata get treated as more trustworthy. The flip side is real too: thin alt text and no schema increasingly gets ignored by AI engines, even on pages that still rank fine in Google Images. We've watched that exact thing play out on client sites this quarter.
There's also a competitive window worth saying out loud. ImageObject adoption is still patchy, which means the marginal value of adding it to your priority pages right now is unusually high. Easy wins rarely stay easy. This one won't.
If you sell anything visual, products, before-and-after work, technical diagrams, services with a photographable outcome, this is where we'd point your image budget for the next twelve months. Not at more alt text. At ImageObject blocks that tell engines exactly what they're looking at and who made it. If you'd rather we built the schema templates and citation tracking for you, that's the core of our AI search visibility service, and you can tell us what you're trying to get cited for and we'll take a look.
The move from alt text to structured data isn't a future trend you've got time to plan for. It already happened. The only open question is whether your images are speaking the language AI engines read, or the one they've started to ignore.


