Discussion Technical SEO AI Crawlers

Skal jeg tillade GPTBot og andre AI-crawlere? Opdagede netop at min robots.txt har blokeret dem

"WebDev_Technical_Alex" · 2026-01-09T00:00:00+00:00

"Fællesskabsdiskussion om at tillade AI-bots at crawle dit site. Virkelige erfaringer med robots.txt-konfiguration, llms.txt-implementering og håndtering af AI-crawlere."

WebDev_Technical_Alex · Lead-udvikler hos marketingbureau

· Jan 9, 2026 · 95 upvotes · 10 comments

WebDev_Technical_Alex

Lead Developer at Marketing Agency · January 9, 2026

Just audited a client’s site and discovered something interesting.

The discovery:

Their robots.txt has been blocking AI crawlers for 2+ years:

User-agent: *
Disallow: /private/

# This was added by security plugin in 2023
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Impact:

Zero AI citations for the brand
Competitors appearing in AI answers
Client wondering why “AI SEO” wasn’t working

Now I’m questioning:

Should we allow ALL AI crawlers?
What’s the difference between training and search crawlers?
Is there a recommended robots.txt configuration?
What about this llms.txt thing I keep hearing about?

Questions for the community:

What’s your robots.txt configuration for AI?
Do you differentiate between crawler types?
Have you implemented llms.txt?
What results did you see after allowing AI crawlers?

Looking for practical configurations, not just theory.

10 comments

10 Comments

TechnicalSEO_Expert_Sarah Expert Technical SEO Consultant · January 9, 2026

This is more common than people realize. Let me break down the crawlers:

AI Crawler Types:

Crawler	Company	Purpose	Recommendation
GPTBot	OpenAI	Model training	Your choice
ChatGPT-User	OpenAI	Real-time search	Allow
ClaudeBot	Anthropic	Real-time citations	Allow
Claude-Web	Anthropic	Web browsing	Allow
PerplexityBot	Perplexity	Search index	Allow
Perplexity-User	Perplexity	User requests	Allow
Google-Extended	Google	Gemini/AI features	Allow

The key distinction:

Training crawlers (GPTBot): Your content trains AI models
Search crawlers (ChatGPT-User, PerplexityBot): Your content gets cited in responses

Most companies:

Allow search crawlers (you want citations) and make a business decision on training crawlers.

Recommended robots.txt:

# Allow AI search crawlers
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Google-Extended
Allow: /

# Block training if desired (optional)
User-agent: GPTBot
Disallow: /

Sitemap: https://yoursite.com/sitemap.xml

CrawlerMonitor_Mike · January 9, 2026

Replying to TechnicalSEO_Expert_Sarah

Important addition: verify the crawlers are actually being blocked vs just not visiting.

How to check:

Server logs: Look for user-agent strings
Firewall logs: Check if WAF is blocking
CDN logs: Cloudflare/AWS may rate-limit

What we found at one client:

robots.txt allowed GPTBot, but Cloudflare’s security rules were blocking it as “suspicious bot.”

Firewall configuration for AI bots:

If using Cloudflare:

Create firewall rule: Allow if User-Agent contains “GPTBot” OR “PerplexityBot” OR “ClaudeBot”
Whitelist official IP ranges (published by each company)

robots.txt is necessary but not sufficient.

Check all layers of your stack.

LLMsExpert_Lisa AI Integration Specialist · January 9, 2026

Let me explain llms.txt since you asked:

What is llms.txt:

A new standard (proposed 2024) that gives AI systems a structured overview of your site. Think of it as a table of contents specifically for language models.

Location: yoursite.com/llms.txt

Basic structure:

# Your Company Name

> Brief description of your company

## Core Pages

- [Home](https://yoursite.com/): Main entry point
- [Products](https://yoursite.com/products): Product catalog
- [Pricing](https://yoursite.com/pricing): Pricing information

## Resources

- [Blog](https://yoursite.com/blog): Industry insights
- [Documentation](https://yoursite.com/docs): Technical docs
- [FAQ](https://yoursite.com/faq): Common questions

## Support

- [Contact](https://yoursite.com/contact): Get in touch

Why it helps:

AI systems have limited context windows. They can’t crawl your entire site and understand it. llms.txt gives them a curated map.

Our results after implementation:

AI citations up 23% within 6 weeks
More accurate brand representation in AI answers
Faster indexing of new content by AI systems

ContentLicensing_Chris · January 8, 2026

The training vs search distinction deserves more attention.

The philosophical question:

Do you want your content training AI models?

Arguments for allowing training:

Better AI = better citations of your content
Industry thought leadership spreads through AI
Can’t opt out of past training anyway

Arguments against:

No compensation for content use
Competitors benefit from your content
Licensing concerns

What publishers are doing:

Publisher Type	Training	Search
News sites	Block	Allow
SaaS companies	Allow	Allow
E-commerce	Varies	Allow
Agencies	Allow	Allow

My recommendation:

Most B2B companies should allow both. The citation benefit outweighs the training concern.

If you’re a content publisher with licensing value, consider blocking training while allowing search.

ResultsTracker_Tom Expert · January 8, 2026

Let me share actual results from unblocking AI crawlers:

Client A (SaaS):

Before: GPTBot blocked, 0 AI citations After: GPTBot + all crawlers allowed

Metric	Before	30 days	90 days
AI citations	0	12	47
AI-referred traffic	0	0.8%	2.3%
Brand searches	baseline	+8%	+22%

Client B (E-commerce):

Before: All AI blocked After: Search crawlers allowed, training blocked

Metric	Before	30 days	90 days
Product citations	0	34	89
AI-referred traffic	0	1.2%	3.1%
Product searches	baseline	+15%	+28%

The timeline:

Week 1-2: Crawlers discover and index content
Week 3-4: Start appearing in AI answers
Month 2-3: Significant citation growth

Key insight:

Unblocking isn’t instant results. Takes 4-8 weeks to see meaningful impact.

SecurityExpert_Rachel DevSecOps Engineer · January 8, 2026

Security perspective on AI crawlers:

Legitimate concerns:

Rate limiting - AI bots can be aggressive crawlers
Content scraping - distinguishing AI bots from scrapers
Attack surface - allowing more bots = more potential vectors

How to mitigate:

Verify crawler identity:
- Check user-agent string
- Verify IP against published ranges
- Use reverse DNS lookup

Rate limiting (per crawler):

GPTBot: 100 requests/minute
ClaudeBot: 100 requests/minute
PerplexityBot: 100 requests/minute

Monitor for anomalies:
- Sudden traffic spikes
- Unusual crawl patterns
- Requests to sensitive areas

Official IP ranges:

Each AI company publishes their crawler IPs:

OpenAI: https://openai.com/gptbot
Anthropic: https://anthropic.com/claude
Perplexity: https://perplexity.ai/perplexitybot

Verify against these before whitelisting.

WordPressExpert_Jake · January 7, 2026

For WordPress users - common blockers I’ve seen:

Security plugins that block AI:

Wordfence (default settings may block)
Sucuri (bot blocking features)
All In One Security
iThemes Security

How to check:

Wordfence: Firewall → Blocking → Advanced Blocking
Sucuri: Firewall → Access Control → Bot List
Check “blocked” logs for AI crawler user-agents

WordPress robots.txt:

WordPress generates robots.txt dynamically. To customize:

Option 1: Use Yoast SEO → Tools → File editor Option 2: Create physical robots.txt in root (overrides) Option 3: Use plugin like “Robots.txt Editor”

Our standard WordPress configuration:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

TechnicalSEO_Expert_Sarah Expert · January 7, 2026

Replying to WordPressExpert_Jake

Good WordPress coverage. Adding: how to create llms.txt for WordPress.

Option 1: Static file

Create llms.txt in your theme’s root and upload to public_html/

Option 2: Plugin approach

Several plugins now support llms.txt generation:

AI Content Shield
RankMath (in recent versions)
Custom plugin using template

Option 3: Code snippet

// In functions.php
add_action('init', function() {
    if ($_SERVER['REQUEST_URI'] == '/llms.txt') {
        header('Content-Type: text/plain');
        // Output your llms.txt content
        exit;
    }
});

Best practice:

Keep llms.txt updated when you:

Add major new content sections
Change site structure
Launch new products/services

Static file is simplest but requires manual updates.

MonitoringSetup_Maria · January 7, 2026

After you unblock, here’s how to monitor AI crawler activity:

What to track:

Metric	Where to Find	What It Tells You
Crawl frequency	Server logs	How often bots visit
Pages crawled	Server logs	What content they index
Crawl errors	Server logs	Blocking issues
AI citations	Am I Cited	Whether crawling converts to visibility

Server log analysis:

Look for these user-agent patterns:

“GPTBot” - OpenAI
“ClaudeBot” - Anthropic
“PerplexityBot” - Perplexity
“Google-Extended” - Google AI

Simple grep command:

grep -E "GPTBot|ClaudeBot|PerplexityBot|Google-Extended" access.log

What healthy activity looks like:

Multiple AI bots crawling regularly
Coverage of important pages
No crawl errors on key content
Increasing citations over time

Red flags:

Zero AI crawler activity after unblocking
High error rates
Only crawling robots.txt (can’t get past)

WebDev_Technical_Alex OP Lead Developer at Marketing Agency · January 6, 2026

This discussion gave me everything I needed. Here’s our implementation plan:

Updated robots.txt:

# Allow AI search crawlers (citations)
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Google-Extended
Allow: /

# Training crawler - allowing for now
User-agent: GPTBot
Allow: /

# Standard rules
User-agent: *
Disallow: /private/
Disallow: /admin/

Sitemap: https://clientsite.com/sitemap.xml

llms.txt implementation:

Created structured overview of client site with:

Core pages
Product/service categories
Resource sections
Contact information

Firewall updates:

Whitelisted official AI crawler IP ranges
Set appropriate rate limits
Added monitoring for crawler activity

Monitoring setup:

Server log parsing for AI crawler activity
Am I Cited for citation tracking
Weekly check on crawl patterns

Timeline expectations:

Week 1-2: Verify crawlers are accessing site
Week 3-4: Start seeing initial citations
Month 2-3: Full citation growth

Success metrics:

AI crawler visits (target: daily from each platform)
AI citations (target: 30+ in first 90 days)
AI-referred traffic (target: 2%+ of organic)

Thanks everyone for the technical details and real-world configurations.

Have a Question About This Topic?

Get personalized help from our team. We'll respond within 24 hours.

Frequently Asked Questions

Er AI-bots blokeret som standard?

Nej, AI-bots er IKKE blokeret som standard. De crawler dit site, medmindre de eksplicit nægtes adgang i robots.txt. Dog kan nogle ældre robots.txt-filer, sikkerhedsplugins eller firewalls utilsigtet blokere AI-crawlere. Tjek din konfiguration for at sikre, at GPTBot, ClaudeBot, PerplexityBot og Google-Extended kan få adgang til dit indhold.

Hvad er forskellen på træningscrawlere og søgecrawlere?

Træningscrawlere (som GPTBot) indsamler data til AI-modeltræning, hvilket betyder, at dit indhold kan træne fremtidige AI-versioner. Søgecrawlere (som PerplexityBot, ChatGPT-User) henter indhold til realtidssvar fra AI, hvilket betyder, at dit indhold bliver citeret i svarene. Mange virksomheder blokerer træningscrawlere, mens de tillader søgecrawlere.

Hvad er llms.txt og bør jeg implementere det?

llms.txt er en ny standard, der giver AI-systemer et struktureret overblik over dit site. Det fungerer som et indholdsfortegnelse specifikt for sprogmodeller og hjælper dem med at forstå din sitestruktur og finde vigtigt indhold. Det anbefales for AI-synlighed, men er ikke påkrævet som robots.txt.

Overvåg AI-crawler-aktivitet

Spor hvilke AI-bots der crawler dit site, og hvordan dit indhold vises i AI-genererede svar. Se effekten af din crawlerkonfiguration.

Start gratis prøveperiode Se funktioner

Lær mere

Hvilke AI-crawlere bør jeg tillade i robots.txt? GPTBot, PerplexityBot, osv.

Fællesskabsdiskussion om hvilke AI-crawlere, der skal tillades eller blokeres. Virkelige beslutninger fra webmasters om adgang til GPTBot, PerplexityBot og andr...

Dec 30, 2025 7 min læsning

Discussion Technical +1

Er der nogen, der faktisk har konfigureret robots.txt til AI-crawlere? Rådene online er meget forskellige

Fællesskabsdiskussion om konfiguration af robots.txt til AI-crawlere som GPTBot, ClaudeBot og PerplexityBot. Reelle erfaringer fra webansvarlige og SEO-speciali...

Jan 9, 2026 6 min læsning

Discussion Technical SEO +1

WAF-regler for AI-crawlere: Ud over Robots.txt

Lær hvordan webapplikationsfirewalls giver avanceret kontrol over AI-crawlere ud over robots.txt. Implementer WAF-regler for at beskytte dit indhold mod uautori...

Jan 3, 2026 8 min læsning