How We Built an AI Search Content Optimization Tool [Study]
![How We Built an AI Search Content Optimization Tool [Study] How We Built an AI Search Content Optimization Tool [Study]](https://static.semrush.com/blog/uploads/media/e2/2f/e22f51dd255b84e2a38e3e195b8150ad/e56eb1f7f1e194543547bb51a29886b7/how-we-built-a-content-optimization-tool-for-ai-search-study-sm.png)
AI search platforms like ChatGPT, Google AI Mode, and Perplexity are changing the way content is discovered. But what makes one part of the content cited while the other is ignored?
To answer this, we analyzed thousands of citations and compared them to similar pages on Google. Our goal was to see which text-only attributes correlate most with AI citation behavior, and whether these patterns differ from traditional SEO signals.
Key takeaways:
Based on our research, we found five content attributes that showed a strong positive correlation with AI citations, and one that showed a negative correlation:
- Clarity and brevity: +32.83%
- Symptoms of EEAT: + 30.64%
- Q&A format: +25.45%
- Category structure: +22.91%
- Structured data items: + 21.60%
- Non-advertising tone: -26.19%
In other words: Content that leads with clear answers, shows expertise, and uses structured formatting gets cited more often.
Five Content Qualities That Improve AI Quotes
These five trends stood out because the difference between cited and uncited pages was very strong. Below is a breakdown of each criterion and how it appeared on the pages selected by LLMs.
- Clarity and conciseness (33%)
- Symptoms of EEAT (30%)
- Q&A format (25%)
- Class structure (22.91%)
- Structured data score (21.60%)
One interesting parameter here was the non-promotional tone, which shows a negative correlation. This does not mean that LLMs choose the language of promotion.
The most likely explanation is that the articles are professionally written, tend to be well-organized, well-sourced, and well-developed, often using a sales or motivational tone.
As Roma Chereshnev, data scientist at Semrush explains:
Articles written by professional copywriters are SEO-friendly, well-structured, or contain useful information. And since these articles are often written by professionals to attract traffic or offer services or products, it is likely that these articles often use an advertising tone.
Therefore, it is possible that the problem is not that they use a promotional tone, but usually, if the article is good in itself, it is because it is written by a professional.
Cecilia Meis, senior editor at Semrush, notes that these findings also align with the core principles of high-quality content:
Clarity and structure are not SEO shortcuts. They simply make information easier for both humans and AI systems to interpret. If the content is structured, specific, and supported by clear technology, models can reliably understand it.
How to Use This Research
With this data, consider experimenting with your content to improve the visibility of your AI. Look at the pages on your site that are ranking well in Google but not performing well in AI search, and compare them using the criteria above.
Add a short, concise summary at the top of the page that clearly states the key takeaway
- Strengthen EEAT signals by including author information and linking to reliable sources
- Use Q&A formatting in sections where students benefit from direct answers
- Add a thematic structure, lists, tables, or charts to help LLMs categorize content
- Monitor performance in Semrush using Organic Research (on Google rankings) and the Visual Review report (on AI citations)
Content Attributes That Have Little Impact on AI Citations
The scope of our study examined a total of 13 content parameters. Five of them were negative, one was negative.
“Impact” here is the % difference in our positive and negative scores
Here are the content attributes that showed little or no correlation with AI citation behavior:

This does not mean that these qualities are not important for the quality of the content, explains Chereshnev. Instead, they did not distinguish between cited and uncited pages in our dataset. These attributes appeared in both samples in similar amounts, so they did not meaningfully distinguish cited URLs from uncited ones.
Things are changing fast in the AI search space, so in the future (keep a close eye on the updated tutorial) these parameters can look very impactful.
How We Conduct the Course
Our goal was to understand how LLMs think when choosing URLs to cite. Therefore, when we created the criteria, we did not try to match the LLM’s response to a person’s opinion. We have done research trying to look at the criteria from an LLM perspective.
In this study, we only focus on the text that appears on the page. We didn’t check metadata, HTML structure, schema markup, page layout, or any technical aspects of SEO. The goal was to understand how LLMs respond to content such as text, without page code or keyword targeting.
To see which content attributes correlate with AI citations, we compared two groups of sample URLs:
- A “good sample” of URLs cited by AI forums with a set of notifications
- A “bad sample” of URLs that rank in Google’s top 20 for related keywords
We scored both sample groups on all 13 content parameters and measured the % difference in scores.
That way, we could see which content parameters appeared most often on pages cited by AI—even if they didn’t match Google with the right queries.
Study time: July 15 – August 6, 2025
Sample size:
- 11,882 prompts (across ChatGPT Search, Google AI Mode, and Confusion)
- 59,410 keywords (from Google search)
- 304,805 URLs cited by LLMs (good sample)
- 921,614 URLs rank in Google search (negative sample)
- 337,785 total unique URLs
What we have built
These findings directly inform the development of our Content Toolkit, which helps you optimize content for the power of AI citation. The tool helps you see if your content is consistent with these patterns, and where it may need improvement.

AI search citations are dynamic, but not automatic.
Our data sheds light on what’s working, so check your content for clear structure, EEAT signals, and Q&A formatting to make it AI-search-ready.



