Last month I asked ChatGPT a question about commercial HVAC systems for a client project. The answer came back with a small thumbnail next to one of the cited paragraphs. I clicked through. It wasn't the highest-ranked page in Google for that query. It wasn't even in the top ten for Google Images.
But the page had one thing the others didn't. A clean, fully populated ImageObject schema block describing exactly what was in the picture, who took it, and what it depicted.
That moment forced me to throw out a lot of assumptions about image SEO. Alt text is still useful. File names still matter a bit. But the signal that pushed that image into a ChatGPT answer wasn't either of those. It was structured data.
This post is what I've found after testing the same content across ChatGPT search, Perplexity, and Google's AI Mode for six months. The patterns aren't subtle once you know what to look for.
AI engines now embed images directly in answers
Three rollouts in the last 18 months changed image SEO from a side project into a core ranking surface.
OpenAI launched ChatGPT search on 31 October 2024{:target="_blank" rel="noopener noreferrer"}, and from day one it returned visual results inside answers. The launch announcement specifically called out "new visual designs for categories like weather, stocks, sports, news, and maps". By December 2024 it was open to all logged-in users.
Google followed in 2025. AI Mode added visual results on 30 September 2025{:target="_blank" rel="noopener noreferrer"}. Google's blog described a new technique called "visual search fan-out" that runs multiple background queries to recognise subtle details and secondary objects within an image. The system combines Lens, Image Search, and Gemini 2.5's multimodal capabilities into a single visual answer surface.
Perplexity quietly added images to answers across its plans, with images appearing inline alongside source citations on web, iOS, and Android. The company's own help docs{:target="_blank" rel="noopener noreferrer"} now reference visual results as part of the standard answer experience.
So what does this mean in practice? When someone asks "what does a roof junction box look like" or "how do I bleed a radiator", the answer isn't just text anymore. There's an image. That image came from a website. And the website that supplied it just got a citation, a thumbnail, and often a click.
That citation is the new prize. The question is how to win it.
What I tested and what I found
I ran a small set of comparisons across three months. I took 14 pages from client sites and split them into two groups.
Group A had strong traditional image SEO. Descriptive file names, full alt text, captions, image sitemaps, and high-authority backlinks.
Group B had everything in Group A plus a complete ImageObject JSON-LD block on every embedded image, with contentUrl, caption, creator, license, description, and representativeOfPage populated.
I tracked how often each image appeared in ChatGPT, Perplexity, and Google AI Mode answers across 60 prompts that mapped to those pages.
Group A images appeared in answers 11 times across 60 prompts. Group B images appeared 38 times. Same content, same rough authority, same alt text. The structured data was the variable that moved.
I'm not claiming this is a controlled study. The sample is small. But the direction is clear enough that I now treat ImageObject markup as a default for any page where the image carries meaning, not as an optional extra.
How visual citations differ across the three engines
The behaviour isn't identical. Here's what I've seen.
ChatGPT search tends to embed one or two thumbnails per answer when the query is informational. It pulls images from the same domains it's already citing for text. If your text is being cited and your images have clean structured data, you typically get the visual slot too. If your text is cited but your image markup is weak, ChatGPT often skips the image entirely or pulls one from a different source.
Perplexity shows a wider image strip, often six to eight images, with each tied to a specific source. It treats images more like a parallel citation track. You can rank for the image strip without ranking for the text answer. Schema.org metadata seems to feed directly into Perplexity's image scoring, especially the description and caption fields.
Google AI Mode is the most aggressive. The September 2025 update introduced visual fan-out, where Gemini runs background queries on the visual elements themselves. Images with rich structured data get pulled in even when the page text is only loosely related to the query. This is a meaningful shift from classic Google Images, where the surrounding page context did most of the work.
The common thread across all three is that the AI engines need machine-readable confirmation of what an image actually shows. Alt text gives them a fragment. Structured data gives them the full picture, literally.
Why alt text alone is no longer enough
Alt text was built for accessibility. Screen readers read it. That's its primary job, and it's a job worth doing well. According to the 2024 Web Almanac media chapter{:target="_blank" rel="noopener noreferrer"}, only 55% of images on the web have non-blank alt attributes, an improvement of just one percentage point since 2022. The median mobile page contains 13 images, and 99.9% of pages request at least one image resource.
Alt text was also useful for Google Images, because the algorithm could pair it with the file name and surrounding text to guess the image's subject. Google's own image SEO guide{:target="_blank" rel="noopener noreferrer"} still calls alt text "the most important attribute when it comes to providing more metadata for an image".
So why am I saying it's been overtaken?
Two reasons.
First, alt text is short by design. Best practice keeps it under around 125 characters. That's not enough for an AI engine to confidently know that the image shows a 2024 model commercial chiller versus a 2018 residential split system. Structured data has no such limit. The description field can carry a full paragraph of context.
Second, alt text isn't typed. It's plain text. Structured data is typed and linked. An ImageObject can declare its creator, point to a license, attach an associatedArticle, and connect to an Organization. AI engines treat that linked data as far higher confidence than a free-text string. The 2024 Web Almanac structured data chapter{:target="_blank" rel="noopener noreferrer"} found schema.org now has over 20 million instances across the JSON-LD context landscape, with ImageObject frequently linked to Organization and WebPage entities. That web of relationships is what AI engines parse.
Alt text describes one image to a screen reader. Structured data tells an AI engine where that image fits inside your site, your business, and the broader knowledge graph. Those are different jobs.
What ranks in Google Images vs what gets cited by AI
This is the comparison that matters. I've watched the same image rank well in Google Images and never get cited by ChatGPT, and the reverse.
Here's how I'd summarise the signal weights based on what I'm seeing.
Google Images still rewards:
- Descriptive file names like
commercial-chiller-rooftop-installation.jpg - Concise, accurate alt text
- Page authority and inbound links
- Image sitemap inclusion
- Page speed and Core Web Vitals
- Image dimensions and format (WebP, AVIF)
AI engines reward something different:
ImageObjectJSON-LD withcontentUrl,caption, anddescriptionpopulated- A
creatororauthorlink to a definedOrganizationorPerson - A
licenseURL, even if the licence is "all rights reserved" - Strong on-page semantic context, including the H2 directly above the image
- The image being marked
representativeOfPage: truewhen it genuinely is - Connection to other schema types on the page (Article, Product, HowTo, FAQ)
There's overlap. Both want the image to be relevant to the page. Both reward technical fundamentals. But the AI engines weight the linked, typed metadata far more heavily, while Google Images still leans on the older signals.
If you only have time to optimise one signal in 2026, structured data is the higher-leverage choice. The traditional signals matter at the margin. Schema is doing the heavy lifting for AI citations.
What an ImageObject block actually looks like
Here's a minimal but complete ImageObject block. This is the version I now use as a default on client sites where the image is genuinely informative.
{
"@context": "https://schema.org",
"@type": "ImageObject",
"contentUrl": "https://example.com/images/rooftop-chiller-install.webp",
"name": "Rooftop chiller installation, Sydney CBD",
"caption": "Technicians installing a 200kW commercial chiller on a rooftop plant deck",
"description": "A photograph showing two HVAC technicians securing a 200 kilowatt water-cooled commercial chiller to a rooftop plant deck in Sydney. The chiller is connected to copper refrigerant lines and a condenser tower visible in the background.",
"width": 1600,
"height": 1067,
"encodingFormat": "image/webp",
"datePublished": "2026-03-12",
"creator": {
"@type": "Organization",
"name": "Be Cool Refrigeration",
"url": "https://www.becoolrefrigeration.com.au"
},
"copyrightNotice": "© 2026 Be Cool Refrigeration",
"license": "https://www.becoolrefrigeration.com.au/image-license",
"representativeOfPage": false
}
Two things to flag.
The description field is doing work that alt text can't. It's a full sentence, with named entities, a measurement, a location, and surrounding context. AI engines read this and know what's in the frame.
The creator is an Organization with a URL. That gives the AI engine a way to attribute the image and connect it back to your brand entity. This is the same mechanism that drives LLM citation patterns for text.
You can extend this further. Wrap the image in an Article or HowTo schema and reference it via the image property. That nests it inside a richer context and tends to produce more citations in my testing.
For more on Google's official position on these properties, see the image metadata structured data documentation{:target="_blank" rel="noopener noreferrer"} and the general structured data overview{:target="_blank" rel="noopener noreferrer"}.
A checklist for AI-image-optimised content
If you want a practical sequence to follow, this is the one I use on new pages.
- Decide which images on the page are genuinely informative. Decorative images don't need this treatment. A hero photo of a smiling stock model doesn't either.
- For each informative image, save it with a descriptive file name and serve it as WebP or AVIF.
- Write alt text that's accurate and under 125 characters. Keep it for screen readers.
- Add a visible caption beneath the image. AI engines often grab captions verbatim.
- Place the image directly under the H2 or H3 it relates to. The semantic proximity matters.
- Add a complete
ImageObjectJSON-LD block withcontentUrl,name,caption,description,creator,license, anddatePublishedfilled in. - Reference the image from the parent
Article,Product, orHowToschema using theimageproperty, so the relationship is explicit. - Mark the primary page image as
representativeOfPage: true. Don't flag every image this way. - Make sure the image URL is crawlable and indexable. AI engines can't cite what they can't fetch.
- Add the image to your XML image sitemap. This still helps Google Images and costs nothing.
- Validate the markup with Google's Rich Results Test and Schema.org's validator.
- Track citations in ChatGPT, Perplexity, and Google AI Mode for your priority queries. Iterate based on what gets pulled.
That's it. Twelve steps, most of which take minutes per image once you've templated the schema in your CMS.
Where this is heading
I think the gap between traditional image SEO and AI image citations will widen through 2026. AI engines are training on structured data because it's the cleanest signal available to them. Pages that deliver typed, linked metadata get treated as more trustworthy, full stop.
The opposite is also true. Pages with thin alt text and no schema increasingly get ignored by AI engines, even when they rank well in Google Images. I've watched this happen on three client sites this quarter.
There's also a competitive dynamic worth flagging. Right now, ImageObject schema adoption is low. Most sites use it sparsely if at all. That means the marginal value of adding it to your priority pages is unusually high. The window for easy wins is open. It won't stay open forever.
If you sell anything visual, products, services with photographable outcomes, before-and-after work, technical diagrams, this is where I'd put your image SEO budget for the next twelve months. Not on more alt text. On ImageObject blocks that tell AI engines exactly what they're looking at and who made it.
The shift from alt text to structured data isn't a future trend. It's already happened. The only question is whether your images are speaking the language AI engines are reading, or the one they're starting to ignore.


