ChatGPT’s repeat performance reveals the brand’s visibility

We know that AI answers are possible – if you ask AI the same question 10 times, you will get 10 different answers.
But How are the answers different?
That’s the question Rand Fishkin explored in some interesting research.
And it has big implications for how we should think about tracking AI visibility for brands.
In his research, he analyzed data asking for recommendations on all kinds of products and services, including everything from chef’s knives to cancer hospitals and Volvo dealerships in Los Angeles.
Basically, he found that:
- AIs rarely recommend the same list of products in the same order twice.
- For a given topic (eg, running shoes), AIs recommend a few brands more often than others.
In my research, as usual, I focus mainly on B2B use cases. Also, I build on Fishkin’s work by answering these additional questions:
- Does rapid complexity affect the consistency of AI recommendations?
- Does category competition affect the consensus of recommendations?
How to do it
To explore those questions, I first designed 12 tips:
- Competition vs. niche: Six of the data are about highly competitive B2B software categories (eg, accounting software), and the other six are about crowded categories (eg, user business behavior analysis (UEBA) software). I identified the categories using the Contender database, which tracks how many ChatGPT brands are associated with 1,775 different software categories.
- Simple information compared to nuanced: In both “competition” and “niche” information sets, part of the information is simple (“What is the best accounting software?”) and the other part is important information that includes the person and the use case (“For a CFO focused on ensuring financial reporting accuracy and compliance, what is the best accounting software?”)
I used 12 prompts 100 times, each, using the logout, free version of ChatGPT at chatgpt.com (ie, not the API). I used a different IP address for each of the 1,200 interactions to simulate 1,200 different users starting new conversations.
Limitations: This survey only includes responses from ChatGPT. But given the patterns in Fishkin’s results and the similar probabilistic nature of LLM, you can probably generalize the target findings (not the absolute value) below to most/all AIs.
Your customers are searching everywhere. Make sure it’s your product he appears.
The SEO toolkit you know, and the AI visibility data you need.
Start a Free Trial
Start with
Findings
So what happens when 100 different people submit the same information to ChatGPT, asking for product recommendations?
How many ‘open spaces’ in ChatGPT responses are there for brands?
On average, ChatGPT will mention 44 products out of every 100 unique responses. But one of the answer set includes more than 95 products – it really depends on the category.


Competing niche categories
On that note, for information covering competitive categories, ChatGPT mentions almost twice as many brands per 100 responses compared to responses in levels covering “niche” categories. (This is consistent with the criteria I used to select the sections I read.)
Simple information compared to nuanced
On average, ChatGPT talked about fewer products in response to more nuanced commands. But this was not a consistent pattern – in any given software category, sometimes nuanced queries end up having multiple types specified, and sometimes simple queries occur.
This was surprising, as I expected specific requests (eg, “For a SOC analyst who needs to evaluate security alerts on endpoints properly, what is the best EDR software?”) to continuously provide a small set of possible solutions from ChatGPT.
I think ChatGPT may not be the best at assembling a list of solutions for a particular use case because it doesn’t have a deep understanding of many products. (More on this data in a future note.)
Return of the ’10 blue links’
For each individual answer, ChatGPT, on average, will mention only 10 types.
There is quite a range, however – a minimum of 6 products per answer and a maximum of 15 when averaging across all answer sets.


But one answer often calls for words about 10 products without category or fast type.
The main difference is in how much the product pool rotates across the answers – the competing categories are from the deepest bench, even though each answer costs the same amount.
Everything old (in SEO) is new (in GEO/AEO). It reminds me of trying to find a site in one of Google’s “10 green links”.
Dive deep: How to measure the visibility of your AI search product and demonstrate business impact
Get the newsletter search marketers rely on.
How relevant are ChatGPT’s product recommendations?
If you ask ChatGPT for B2B software recommendations 100 different times, there are only ~5 brands, on average, that will be mentioned 80%+ of the time.
To put it in perspective, that’s just 11% of all 44 brands we’ll mention across those 100 responses.


So it’s very competitive to be one of the ChatGPT brands that keep talking every time someone asks for recommendations in your category.
As you might expect, these “prominent” brands tend to be large, established brands with strong recognition. For example, the leading brands in the accounting software category are QuickBooks, Xero, Wave, FreshBooks, Zoho, and Sage.
If you are not a big brand, it is better to be in the niche category:


If you work in a niche category, you are not only competing with several companies, but there are also more “open spaces” for you to find to be the top brand in ChatGPT answers.
In niche categories, 21% of all ChatGPT mentioned brands are leading brands, which are mentioned 80%+ of the time.
Compare this to only 7 percent of all brands leading competitive categories, where the majority of brands (72%) struggle in the long tail, which is mentioned less than 20% of the time.


Subtle information doesn’t significantly change the long tail of low visibility brands (with <20% visibility), but it does change the "winner circle." Adding human context to awareness makes it more difficult to reach a higher level – you can see the steep "rock" that the brand must climb in the "nuanced orders" graph above.
This makes intuitive sense: when someone asks “best accounting software for a Head of Finance,” ChatGPT has a clear answer in mind and is pretty much bound for a few top picks.
However, it is important to note that the entire pool is not very limited – ChatGPT mentions ~42 products in 100 responses to the hidden information, just a few less than the ~46 mentioned in response to simple commands. If subtle information makes the winner’s circle more special, why don’t they also reduce the overall field?
In part, it is possible that the “nuanced” questions we fed were less objective and specific than the simple questions we asked.
But, based on some data I’m seeing, I think this is partly about ChatGPT not knowing enough about many brands to be more selective. I will share more about this in a future note.
Dig deep: 7 hard facts about measuring AI visibility and GEO performance
What does this mean for B2B marketers?
If you are not a prominent brand, pick your battles – niche down
It has never been more important to differentiate. 21% of mentioned products rank high in niche vs. 7% of competitors.
Without a lot of time and money to market the product, an upstart tech company won’t be a dominant product in a broad, established category like accounting software.
But the field is uncompetitive if you rely on your unique, differentiating strengths. ChatGPT will likely treat you as a premium brand if you’re working to make your product known as “the best accounting software for real estate companies in North America.”
Most AI visibility tracking tools are very misleading
Given the inconsistency of ChatGPT’s recommendations, a single check of any information provided is almost meaningless. Unfortunately, checking each piece of information one at a time is exactly what most AI visibility tracking tools do.
If you want anything approaching a statistically significant detection result for any given data, you need to run the data at least many times, even 100+ times, depending on how accurate you need the data to be.
But obviously that’s not possible for most people, so my suggestion is: To find the key, the instructions at the bottom of the funnel you’re tracking, use it ~5 times whenever you pull data.
That will at least give you a reasonable idea of whether your product is seen most of the time, sometimes, or never.
Your goal should be to feel confident that your product is in the long tail that is barely noticed, in the middle of the view, or the top-tier that dominates with any warning. Whether you use my ‘less than 20%’, ’20–80%’, and ‘80%+’ categories, or your own thresholds, this is an approach that follows data and common sense.
See the complete picture of your search visibility.
Track, optimize, and win in Google search and AI from one platform.
Start a Free Trial
Start with

What’s next?
In future newsletters and LinkedIn posts, I will build on these new research findings:
- How does ChatGPT talk about the brands it regularly recommends? Is it an indication of how much ChatGPT “knows” about the products?
- Does different information with the same search intent tend to produce the same set of recommendations?
- How relevant is the “grade” to the responses? Are prominent brands often mentioned first?
This article was originally published in Visible on beehiiv (as Many AI visibility tracking is misleading (here is my new data)) and republished with permission.
Contributing writers are invited to create content for Search Engine Land and are selected for their expertise and contribution to the search community. Our contributors work under the supervision of editorial staff and contributions are assessed for quality and relevance to our students. Search Engine Land is owned by Semrush. The contributor has not been asked to speak directly or indirectly about Semrush. The opinions they express are their own.



