We analyzed 680M AI citations - which publications actually get cited most?
Community discussion on which publications AI engines cite most frequently. Real experiences from marketers analyzing citation patterns across ChatGPT, Perplexi...
I’ve been analyzing citation patterns across AI platforms. The bias is real and significant.
What the data shows:
Top 10 sources account for ~50% of citations across major AI platforms. Meanwhile, millions of quality websites share the remaining 50%.
Specific patterns:
| Platform | Top Source | % of Citations |
|---|---|---|
| ChatGPT | Wikipedia | 7.8% |
| Perplexity | 6.6% | |
| Google AI | YouTube | 1.9% |
The bias in practice:
I tested two pieces of content:
The major publication gets cited 8x more often, despite the blog having better, more detailed content.
My questions:
What are you seeing?
The source selection bias is well-documented. Here’s why it happens.
Root causes:
Training data composition
Authority signal inheritance
Explicit source preferences
Format and structure bias
The implications:
This bias reinforces existing power structures. Major publishers get more AI visibility, which brings more traffic, which brings more authority, which brings more AI visibility…
Is it getting better?
Mixed. Some platforms adding more sources. But concentration at top persists.
Speaking as a small publisher: this is frustrating.
Our situation:
Our AI visibility: Almost zero.
Meanwhile, we see our research get picked up by major outlets, and THEIR version gets cited by AI, not ours.
What we’re trying:
The uncomfortable reality:
For now, the strategy is “get mentioned by sources AI trusts” rather than “become a source AI trusts.”
It’s a workaround, not a solution.
Let me share some quantitative analysis:
Citation distribution study (1,000 prompts):
| Source Tier | % of Citations | % of Web |
|---|---|---|
| Top 100 sites | 52% | 0.0001% |
| Top 1,000 sites | 78% | 0.001% |
| All other sites | 22% | 99.999% |
The Pareto effect is extreme.
Less than 0.001% of websites get 78% of AI citations.
What predicts citation:
| Factor | Correlation |
|---|---|
| Domain age | 0.42 |
| Wikipedia presence | 0.61 |
| Major publication mentions | 0.58 |
| Backlink count | 0.45 |
| Content quality (human rated) | 0.23 |
The insight:
Content quality has the LOWEST correlation with being cited. Authority signals matter more.
This is bias by definition.
Working within the bias system:
Accept reality, then strategize.
You can’t change how AI systems work. But you can position your content to benefit from their biases.
The dual strategy:
1. Direct optimization (long game)
2. Indirect positioning (short game)
Our client results:
Client with no AI visibility:
6 months later: 400% increase in AI citations.
The meta-strategy:
Become a source the sources trust. The AI follows.
Brand perspective on source bias:
The competitive impact:
Our competitor (larger, older company) gets cited 5x more than us in AI responses, despite:
Why?
Our response:
Phase 1 (Immediate):
Phase 2 (Ongoing):
Phase 3 (Monitoring):
Timeline: Expecting 12-18 months to meaningfully shift the balance.
This is a marathon, not a sprint.
Academic perspective on AI source bias:
The research consensus:
Source selection bias in LLMs is well-documented and concerning:
What the papers show:
What might help:
The reality:
AI companies optimize for response quality, not source fairness. Bias reduction isn’t a priority unless users demand it.
Awareness is the first step.
Content creator’s frustration:
The cycle that kills us:
Real example:
We published original research on industry trends. Major business publication wrote a 500-word summary citing us briefly.
ChatGPT cites: The major publication ChatGPT doesn’t cite: Our original research
What I’ve learned to do:
The harsh truth:
Being the original source doesn’t matter if AI systems don’t recognize you as authoritative.
Quality alone isn’t enough.
The niche opportunity in source bias:
Where small players CAN win:
The bias affects broad queries most. For specific, niche queries:
Our approach:
Instead of: “What is AI marketing?” (dominated by major publications) Focus on: “How do B2B SaaS companies use AI for customer segmentation?” (niche)
Results:
| Query Type | Citation Rate (Major Sites) | Citation Rate (Niche Sites) |
|---|---|---|
| Broad | 85% | 15% |
| Medium | 60% | 40% |
| Niche | 30% | 70% |
The strategy:
You can’t beat major sites broadly. But you can dominate niches.
Excellent discussion. Here’s my synthesis on source selection bias:
The reality:
AI source selection bias is real, significant, and self-reinforcing. Top sources get cited more, which builds more authority, which gets them cited more.
The data:
Strategies within the system:
Short-term:
Long-term:
Measurement:
The uncomfortable truth:
The system is biased. Working within the bias is pragmatic. Building genuine authority eventually overcomes it, but takes time.
Quality content is necessary but not sufficient. Strategic positioning matters.
Thanks everyone for the valuable perspectives!
Get personalized help from our team. We'll respond within 24 hours.
Understand how AI systems select and cite sources. Track your visibility and identify bias patterns affecting your brand.
Community discussion on which publications AI engines cite most frequently. Real experiences from marketers analyzing citation patterns across ChatGPT, Perplexi...
Community discussion on increasing AI citation frequency. Real strategies from marketers who improved how often their brands appear in ChatGPT, Perplexity, and ...
Community discussion on creating competitor comparison content for AI visibility. Real experiences from marketers on fair comparisons, alternative pages, and co...
Cookie Consent
We use cookies to enhance your browsing experience and analyze our traffic. See our privacy policy.