What content formats actually get cited by AI? Testing different approaches
Community discussion on which content formats perform best in AI search. Real testing results and strategies for AI-optimized content.
I’ve been thinking a lot about how we structure content for AI consumption, and I’m wondering if traditional content strategies are becoming obsolete.
The hypothesis:
With RAG (Retrieval Augmented Generation) becoming standard for AI systems, the way we organize and structure information matters more than ever. AI systems aren’t just reading our content - they’re querying it, chunking it, and retrieving specific pieces to cite.
What I’ve been testing:
Rebuilt our company’s knowledge base from the ground up with AI retrieval in mind:
Early results:
Our content is getting cited significantly more in Perplexity and Google AI Overviews. ChatGPT citations improved after their latest crawl.
Questions:
I feel like we’re at an inflection point where content architecture matters as much as content quality.
You’re onto something important here. I work on RAG implementations for enterprise clients, and the content side is often the bottleneck.
Why knowledge base structure matters for AI:
When AI systems retrieve content, they don’t read it like humans. They:
What this means for content creators:
The chunking sweet spot:
200-500 tokens is right. Too small and you lose context. Too large and you dilute relevance. I’ve seen optimal chunk sizes vary by content type:
The structure you’re implementing is exactly what AI retrieval systems need to work effectively.
The chunking insight is gold. We restructured our help documentation from long-form articles to modular, question-based chunks.
Each chunk now:
Our support content now appears in AI responses way more than before. The AI can grab exactly the piece it needs instead of trying to parse through 2000-word articles.
We’re doing something similar at enterprise scale. Here’s what’s working:
Knowledge base architecture for AI:
The measurement piece:
We track AI citations using Am I Cited and compare to our knowledge base usage metrics. Content that gets cited more in AI also tends to be our best-structured content. There’s a strong correlation between structure quality and citation frequency.
What surprised us:
FAQ pages outperform comprehensive guides for AI citations. The question-answer format maps perfectly to how AI generates responses. Our best-cited pages are all structured as discrete Q&A pairs.
Technical documentation perspective here.
We’ve completely rethought how we write docs with AI retrieval in mind:
Old approach:
New approach:
The result:
Our docs are now cited regularly when developers ask ChatGPT questions about our API. Before the restructure, we were invisible even for our own product questions.
The difference? AI can now extract specific, actionable information from our docs instead of having to parse through context and narrative.
Let me add some data on platform-specific behavior.
How different platforms use knowledge bases:
| Platform | Retrieval Method | Citation Style | Freshness Preference |
|---|---|---|---|
| ChatGPT | Training data + live browse | Implicit synthesis | Moderate |
| Perplexity | Real-time web search | Explicit with sources | High |
| Google AI | Search index + Knowledge Graph | Mixed | High |
| Claude | Training data + web search | Cautious citation | Moderate |
Implications:
A comprehensive knowledge base strategy needs to account for these differences. What works for one platform may not work for another.
We’re a SaaS startup that built our entire docs site with AI retrieval as the primary use case. Some practical learnings:
Technical implementation:
What worked:
Our product documentation appears in ChatGPT responses for our niche. When users ask how to do something with our type of software, we get cited alongside much larger competitors.
What didn’t work:
Initially tried to be too clever with dynamic content generation. AI systems prefer stable, consistently structured content over dynamically assembled pages.
Question about the meta-layer: How are you all handling the relationship between your website content and your knowledge base?
Are you: A) Treating them as the same thing (website IS the knowledge base) B) Having a separate internal knowledge base that feeds the website C) Building a parallel AI-optimized content layer
We’re debating this internally and not sure which approach scales best.
Great question. Here’s how we think about it:
Our approach is B with elements of A:
We maintain a structured internal knowledge base (our source of truth) that generates both:
The benefits:
Practically:
Same content, different presentations. The knowledge base has rich metadata and structure. The website version adds design and narrative flow. Both serve their audience.
I’d avoid option C (separate AI layer) - too much content to maintain and they’ll inevitably drift out of sync.
Adding an ML perspective to complement the content strategy discussion.
Why RAG prefers structured content:
Vector embeddings work better on semantically coherent text. When you write “What is X? X is…” the embedding captures that definition relationship clearly. When X is buried in paragraph 7 of a rambling article, the embedding becomes noisy.
Practical implications:
The embedding quality correlation:
I’ve tested this - content that produces clean, semantically distinct embeddings gets retrieved more accurately. Sloppy structure = fuzzy embeddings = poor retrieval = fewer citations.
Structure isn’t just about human readability anymore.
Traditional publisher perspective. We’re grappling with this.
Decades of content created for print-first or web-browse experiences. Now we need it structured for AI retrieval?
The challenge:
What we’re doing:
Early wins:
Our restructured “explainer” content is getting cited significantly more than our traditional articles. The ROI on restructuring is becoming clear.
But the scale of retroactive work is daunting.
This thread is incredibly valuable. My takeaways:
Knowledge base structure for AI citations:
The paradigm shift:
Content strategy is evolving from “write for humans, optimize for search” to “structure for machines, present for humans.” The underlying content architecture becomes as important as the writing quality.
Anyone who ignores this is going to find their content increasingly invisible in AI-mediated discovery.
Perfect summary. To add one final thought:
This is the future of content strategy.
We’re moving from a world where content lives on pages that humans browse to a world where content lives in retrievable knowledge structures that AI systems query on behalf of humans.
The organizations that build robust knowledge architectures now will dominate AI-mediated discovery. Those that don’t will become invisible as AI becomes the primary content discovery interface.
It’s not hyperbole - it’s the logical endpoint of current trends.
Thanks everyone for the insights. Going to incorporate a lot of this into our knowledge base redesign.
Get personalized help from our team. We'll respond within 24 hours.
Track how your knowledge base content appears in AI-generated answers across all major platforms. Understand which content gets retrieved and optimize for maximum AI visibility.
Community discussion on which content formats perform best in AI search. Real testing results and strategies for AI-optimized content.
Community discussion on whether tables and structured formatting improve AI citation rates. Real test results from marketers experimenting with content structur...
Community discussion on adding human expertise to AI-generated content. Real strategies from content teams balancing AI efficiency with authentic expertise and ...
Cookie Consent
We use cookies to enhance your browsing experience and analyze our traffic. See our privacy policy.