Technical implementation notes.
Proper robots.txt configuration:
# Specific AI crawler rules
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: anthropic-ai
Allow: /
# Default for other bots
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Common mistakes:
- Order matters - Specific rules before wildcards
- Typos kill you - GPTBot not GPT-Bot
- Testing is essential - Use Google’s robots.txt tester
Rate limiting consideration:
Some sites aggressively rate limit bots. AI crawlers are impatient. If you return 429 errors, they move on and cite competitors.
Check your server logs for AI crawler activity. Make sure they’re getting 200 responses.
The Cloudflare consideration:
If you use Cloudflare with “Bot Fight Mode” enabled, AI crawlers might be blocked at the network level, regardless of robots.txt.
Check Cloudflare settings if you’re allowing in robots.txt but not seeing citations.