I’ve been covering this beat extensively. Some context:
The recency bias is real:
Research shows 65% of AI citations come from content published within the last year. This means:
- Your archive has limited AI value
- Fresh content matters more
- Continuous publishing is required to maintain visibility
The Wikipedia exception:
Wikipedia gets cited in 47.9% of ChatGPT’s top sources because it’s freely licensed (CC BY-SA 3.0). The lesson: licensing terms matter for AI visibility.
The Reddit example:
Reddit’s $60M/year deal with Google shows the value of community content. Their WebText2 dataset gets 5x weighting in GPT training.
Takeaway:
If you can’t negotiate a major deal, focus on:
- Fresh, continuous content
- Community/discussion content
- Unique original research
- Consider RSL/marketplace models