Regex Pattern for AI Traffic: Capturing ChatGPT and Perplexity Referrals

Why AI Traffic Matters

Tracking AI traffic has become essential for modern websites, as artificial intelligence platforms now drive a significant portion of web referrals that traditional analytics often miss. According to recent data, 63% of websites receive traffic from AI platforms, with ChatGPT alone accounting for approximately 50% of all AI-generated referrals. The challenge lies in GA4’s default tracking behavior: many AI platforms either strip referrer information or appear as direct traffic, making them invisible in standard reports. This hidden traffic creates a critical blind spot in your analytics, preventing you from understanding which content resonates with AI systems and their users. Without proper regex filtering, you’re losing visibility into one of the fastest-growing traffic sources and missing opportunities to optimize for AI-driven discovery.

GA4 dashboard showing hidden AI traffic in referral sources

Understanding AI Traffic Sources

Different AI platforms exhibit distinct referrer behaviors, making comprehensive tracking require platform-specific approaches. Here’s how major AI platforms behave in GA4:

PlatformDomainReferrer BehaviorAppears AsLimitations
ChatGPTopenai.comPasses referrer headerReferral trafficMay appear as direct on some configurations
Perplexityperplexity.aiPasses referrer headerReferral trafficInconsistent referrer patterns across versions
Claudeclaude.aiStrips referrer informationDirect trafficRequires custom event tracking for attribution
Google Geminigemini.google.comPasses referrer headerReferral trafficRecently added referrer support
Copilotcopilot.microsoft.comStrips referrer informationDirect trafficLimited referrer data available
Bardbard.google.comPasses referrer headerReferral trafficMerged into Gemini; legacy tracking still relevant
DeepSeekdeepseek.comPasses referrer headerReferral trafficEmerging platform with growing traffic volume
Mistralchat.mistral.aiPasses referrer headerReferral trafficNewer platform with limited historical data

ChatGPT and Perplexity consistently pass referrer headers, making them easier to track through standard GA4 filters. Claude and Copilot present greater challenges by stripping referrer information entirely, requiring alternative tracking methods. Understanding these behavioral differences is crucial for building effective regex patterns that capture all AI traffic sources accurately.

Logo

Ready to Monitor Your AI Visibility?

Track how AI chatbots mention your brand across ChatGPT, Perplexity, and other platforms.

The Regex Pattern Fundamentals

Regular expressions (regex) are powerful pattern-matching tools that allow you to identify and filter traffic based on specific text patterns in GA4. GA4’s Traffic Acquisition report uses regex to match referrer domains, enabling you to create filters that capture variations and multiple platforms simultaneously. Rather than creating individual filters for each AI platform, regex allows you to write a single pattern that matches multiple domains and URL structures.

Here’s the basic regex syntax you’ll use in GA4:

^(openai\.com|perplexity\.ai|claude\.ai)$

Key regex components for AI traffic tracking:

  • Pipe character (|): Acts as “OR” operator, allowing multiple domain matches
  • Caret (^) and dollar sign ($): Anchor the pattern to the beginning and end of the string
  • Escaped dots (\.): Match literal dots in domain names (required because dots have special meaning in regex)
  • Parentheses (): Group multiple options together for cleaner patterns
  • Asterisk (*) and plus (+): Match zero or more, or one or more characters respectively

The escaped dot is critical because in regex, an unescaped dot matches any character, not just a literal period. This is why openai.com would incorrectly match openaiXcom, while openai\.com matches only the actual domain.

Building Your First Regex Filter

Creating your first AI traffic filter in GA4 is straightforward and requires only a few steps:

  1. Navigate to AdminData Filters in your GA4 property
  2. Click “Create Filter” and name it “AI Traffic - ChatGPT & Perplexity”
  3. Select Filter Type: Choose “Traffic type” and set it to “Referral”
  4. In the Condition section, select “Referrer” from the dropdown
  5. Choose “Matches Regex” as your matching condition
  6. Enter the pattern: ^(openai\.com|perplexity\.ai)$
  7. Click “Create Filter” and verify it’s set to “Active”

To validate your filter is working, check your Traffic Acquisition report within 24-48 hours and look for referral traffic from these domains. Start with just ChatGPT and Perplexity to ensure the pattern works correctly before expanding to additional platforms. You can test your regex pattern using GA4’s built-in preview feature before applying it to live data.

Advanced Regex Patterns for Comprehensive AI Tracking

For complete AI traffic visibility, use this comprehensive regex pattern that covers all major AI platforms:

^(openai\.com|perplexity\.ai|claude\.ai|gemini\.google\.com|copilot\.microsoft\.com|bard\.google\.com|deepseek\.com|chat\.mistral\.ai|huggingface\.co|replicate\.com)$

This master pattern captures:

  • ChatGPT traffic via openai\.com - the largest AI referral source
  • Perplexity traffic via perplexity\.ai - rapidly growing AI search engine
  • Claude traffic via claude\.ai - Anthropic’s AI assistant (though often appears as direct)
  • Google Gemini via gemini\.google\.com - Google’s unified AI platform
  • Microsoft Copilot via copilot\.microsoft\.com - integrated into Microsoft products
  • Google Bard via bard\.google\.com - legacy pattern for historical data
  • DeepSeek via deepseek\.com - emerging Chinese AI platform
  • Mistral via chat\.mistral\.ai - European open-source AI platform
  • HuggingFace via huggingface\.co - AI model hub and community platform
  • Replicate via replicate\.com - AI model API platform

For more granular tracking, create separate filters for different AI categories:

# Search-focused AI platforms
^(perplexity\.ai|deepseek\.com)$

# General-purpose AI assistants
^(openai\.com|claude\.ai|gemini\.google\.com)$

# Enterprise AI platforms
^(copilot\.microsoft\.com|bard\.google\.com)$

This segmentation allows you to analyze traffic patterns by AI platform category and identify which types of AI systems drive the most valuable traffic to your content.

Regex pattern syntax showing AI domain matching and pattern logic

Creating Custom Channel Groups with Regex

Custom channel groups provide a cleaner way to organize AI traffic alongside your existing channels:

  1. Go to AdminChannel Groups in your GA4 property
  2. Click “Create Channel Group” and name it “AI Traffic Channels”
  3. Click “Add Condition” to create your first rule
  4. Set the condition: Source/Medium matches regex ^(openai\.com|perplexity\.ai|claude\.ai|gemini\.google\.com|copilot\.microsoft\.com|bard\.google\.com|deepseek\.com|chat\.mistral\.ai)/(organic|referral)$
  5. Name this channel “AI Assistants”
  6. Add another condition for platforms that appear as direct: Source matches regex ^(direct)$ AND Page Title contains regex (ChatGPT|Claude|Gemini|Copilot)
  7. Name this channel “AI Direct Traffic”
  8. Click “Create” and ensure this channel group is set as your primary reporting view

Channel ordering is critical: GA4 assigns traffic to the first matching channel, so place your most specific AI rules before broader categories. This prevents AI traffic from being incorrectly categorized as Direct or Organic. Test your channel group by viewing the Traffic Acquisition report and confirming AI traffic appears in your new “AI Traffic Channels” group.

Exploration Reports and Regex Filtering

Create custom exploration reports to deeply analyze AI traffic patterns:

  1. Navigate to Explore in your GA4 property
  2. Select “Blank Exploration” as your starting template
  3. Add Dimensions: Source/Medium, Page Title, Device Category, Country
  4. Add Metrics: Users, Sessions, Engagement Rate, Conversion Rate
  5. Apply Filter: Click “Add Filter” and select “Source” matches regex ^(openai\.com|perplexity\.ai|claude\.ai)$
  6. Create Visualization: Choose “Table” or “Scatter” to analyze relationships between AI platforms and user behavior
  7. Save the exploration as “AI Traffic Deep Dive” for recurring analysis

Recommended metrics for AI traffic analysis include bounce rate, average session duration, and conversion rate to understand how AI-referred users engage differently from other traffic sources. Use the Funnel Exploration template to track how AI users progress through your conversion funnel compared to organic or paid traffic. This reveals whether AI-referred traffic has higher or lower quality than your other channels.

Monitoring and Maintaining Your Regex Patterns

Effective AI traffic tracking requires ongoing maintenance and monitoring:

  • Weekly review: Check your Traffic Acquisition report to ensure regex filters are capturing expected traffic volumes
  • Monthly analysis: Compare AI traffic trends across platforms to identify emerging sources or declining referrers
  • Quarterly updates: Add new AI platforms as they emerge (e.g., new Claude versions, regional AI platforms)
  • Validation checks: Periodically test your regex patterns using online regex testers to ensure they still match intended domains
  • Alert setup: Create GA4 alerts for unusual spikes or drops in AI traffic to catch configuration issues early

Common mistakes to avoid include forgetting to escape dots in domain names, using unanchored patterns that match unintended traffic, and failing to update patterns when AI platforms change their domain structures. Monitor for false positives by occasionally reviewing the actual referrer values in your raw data to ensure your regex isn’t capturing non-AI traffic. As new AI platforms launch or existing ones modify their referrer behavior, update your regex patterns to maintain comprehensive coverage.

Comparing AI Traffic Monitoring Solutions

While GA4 filters provide basic AI traffic tracking, specialized solutions offer deeper insights:

SolutionAI Traffic DetectionReal-time MonitoringEase of SetupAutomation
GA4 Regex FiltersManual pattern creation24-48 hour delayModerate (requires regex knowledge)Limited
AmICited.comAutomatic AI platform detectionReal-time dashboardVery easy (no coding required)Full automation
SemrushBasic AI referral trackingDaily updatesEasy (UI-based)Partial
AhrefsLimited AI traffic dataWeekly reportsModerateMinimal
FlowHunt.ioAI content generation trackingReal-timeEasyPartial (content focus)

AmICited.com stands out as the purpose-built solution for AI traffic monitoring, automatically detecting ChatGPT, Perplexity, Claude, and emerging AI platforms without requiring regex configuration. The platform provides real-time dashboards showing which content attracts AI systems, how AI traffic converts, and detailed breakdowns by AI platform. For teams without regex expertise, AmICited.com eliminates the technical barrier while providing deeper AI-specific insights than GA4 alone. FlowHunt.io serves as an alternative if your primary focus is tracking AI-generated content and content generation platform usage rather than AI referral traffic.

Best Practices and Common Pitfalls

Implementing regex patterns correctly requires attention to detail and understanding common mistakes:

Common MistakeImpactSolution
Forgetting to escape dots (. instead of \.)Matches unintended domains (e.g., openaiXcom)Always use \. for literal dots in domain names
Using unanchored patternsCaptures partial matches and false positivesAlways use ^ at start and $ at end
Mixing regex and non-regex conditions incorrectlyTraffic misclassificationTest conditions separately before combining
Not updating patterns for new AI platformsMissing emerging traffic sourcesReview and update quarterly
Creating overlapping filtersDouble-counting trafficEnsure filters are mutually exclusive

Best practices for accuracy include testing regex patterns in a staging GA4 view before applying to production, documenting your regex patterns with comments explaining each section, and maintaining a changelog of pattern updates. Validate your patterns by comparing GA4 filtered results against your server logs to ensure accuracy. Use GA4’s Data Validation feature to monitor data quality and catch configuration issues before they affect your reporting.

Frequently asked questions

Monitor Your AI Traffic in Real-Time

Stop losing visibility into AI-driven traffic. AmICited automatically detects ChatGPT, Perplexity, and emerging AI platforms without complex regex configuration. Get real-time insights into how AI systems reference your brand.

Learn more

Setting Up AI Traffic Tracking: Complete Technical Guide
Setting Up AI Traffic Tracking: Complete Technical Guide

Setting Up AI Traffic Tracking: Complete Technical Guide

Learn how to track AI referrals from ChatGPT, Perplexity, and Google AI Overviews. Step-by-step technical implementation guide for GA4 and specialized monitorin...

10 min read