Discussion Technical SEO AI Crawlers

How do AI crawlers handle infinite scroll? Our content isn't getting indexed

FR
FrontendDev_Marcus · Frontend Developer
· · 78 upvotes · 10 comments
FM
FrontendDev_Marcus
Frontend Developer · December 19, 2025

We built a modern React site with infinite scroll for our blog. Great user experience, but our content isn’t showing up in AI answers at all.

Google indexes it fine (after some work with SSR). But AI platforms seem to miss most of our content.

Our setup:

  • React SPA with infinite scroll
  • SSR for initial page load
  • Additional content loads via JavaScript on scroll
  • 500+ blog posts, only ~50 seem AI-accessible

Questions:

  • Do AI crawlers execute JavaScript at all?
  • Is infinite scroll fundamentally incompatible with AI visibility?
  • What’s the best technical approach for AI crawler accessibility?
  • Should we rebuild pagination entirely?

Any frontend devs dealt with this?

10 comments

10 Comments

CE
CrawlerTech_Expert Expert Technical SEO Consultant · December 19, 2025

Let me break down how different AI crawlers handle JavaScript:

AI Crawler JavaScript Support:

CrawlerJS RenderingScroll SimulationWait Time
GPTBotLimited/NoneNoMinimal
Google-ExtendedGood (like Googlebot)NoStandard
ClaudeBotLimitedNoMinimal
PerplexityBotVariesNoLimited
Common CrawlNoneNoNone

The core problem:

Infinite scroll requires:

  1. JavaScript execution
  2. Scroll event triggering
  3. Additional HTTP requests
  4. Rendering of new content

Most AI crawlers fail at step 1 or 2.

Why SSR isn’t enough:

Your SSR serves the initial page. But infinite scroll content isn’t “initial” - it loads on interaction. SSR doesn’t solve the interaction dependency.

The fundamental issue:

Infinite scroll is fundamentally incompatible with current AI crawler capabilities. You need an alternative approach.

FM
FrontendDev_Marcus OP · December 19, 2025
Replying to CrawlerTech_Expert
So we basically need to rebuild? What’s the recommended approach?
CE
CrawlerTech_Expert Expert · December 19, 2025
Replying to FrontendDev_Marcus

Recommended approaches (in order of AI-friendliness):

Option 1: Traditional pagination (most AI-friendly)

/blog/page/1
/blog/page/2
/blog/page/3
  • Each page has its own URL
  • Content in initial HTML
  • Sitemap includes all pages
  • AI crawlers can access everything

Option 2: Hybrid approach

  • Infinite scroll for users
  • BUT also provide paginated URLs
  • Sitemap points to paginated versions
  • Use canonical to avoid duplicates
<!-- Infinite scroll page -->
<link rel="canonical" href="/blog/page/1" />

<!-- Pagination always available -->
<nav>
  <a href="/blog/page/1">1</a>
  <a href="/blog/page/2">2</a>
</nav>

Option 3: Prerender for AI crawlers

  • Detect AI user agents
  • Serve prerendered HTML
  • Full content in initial response

Each option has tradeoffs. Option 1 is simplest and most reliable for AI. Option 2 preserves your UX while adding AI accessibility.

RS
ReactDev_Sarah React Developer · December 19, 2025

We went through this exact problem. Here’s our solution:

The hybrid approach implementation:

// URL structure
/blog              // Infinite scroll (user default)
/blog/archive/1    // Paginated (crawler accessible)
/blog/archive/2

Key implementation details:

  1. Sitemap includes only paginated URLs

    • AI crawlers find /blog/archive/* pages
    • These render full content server-side
  2. Infinite scroll page loads same content

    • Uses pagination API under the hood
    • Better UX for humans
  3. Internal links point to individual articles

    • Not to infinite scroll position
    • Each article has its own URL
  4. robots.txt guidance:

# Let crawlers focus on individual articles
# Not the infinite scroll container
Sitemap: /sitemap.xml

Results:

  • Human UX unchanged (infinite scroll)
  • AI crawlers access all content via archive pages
  • Individual articles all indexed
  • Citation rate improved 4x after implementation
NK
NextJSDev_Kevin · December 18, 2025

Next.js specific approach:

Using getStaticPaths + getStaticProps:

// pages/blog/page/[page].js
export async function getStaticPaths() {
  const totalPages = await getTotalPages();
  const paths = Array.from({ length: totalPages }, (_, i) => ({
    params: { page: String(i + 1) }
  }));
  return { paths, fallback: false };
}

export async function getStaticProps({ params }) {
  const posts = await getPostsForPage(params.page);
  return { props: { posts, page: params.page } };
}

Benefits:

  • Static pages for each pagination
  • Full content in HTML at build time
  • AI crawlers get complete content
  • Fast loading (static)

Then add infinite scroll as enhancement:

  • Client-side infinite scroll uses same API
  • Progressive enhancement approach
  • Works without JS too

This gives you the best of both worlds.

PS
Prerender_Specialist Expert · December 18, 2025

Adding prerendering as an option:

Prerendering services for AI crawlers:

You can detect AI crawler user agents and serve prerendered content:

// middleware
if (isAICrawler(req.headers['user-agent'])) {
  return servePrerenderedVersion(req.url);
}

AI crawler detection:

const aiCrawlers = [
  'GPTBot',
  'ChatGPT-User',
  'Google-Extended',
  'ClaudeBot',
  'PerplexityBot',
  'anthropic-ai'
];

function isAICrawler(userAgent) {
  return aiCrawlers.some(crawler =>
    userAgent.includes(crawler)
  );
}

Prerendering options:

  • Prerender.io
  • Rendertron
  • Puppeteer-based custom solution
  • Build-time prerendering

Caution:

Not all AI crawlers identify themselves clearly. Some might be missed. This is a supplementary approach, not a replacement for proper pagination.

SL
SEODevOps_Lisa · December 18, 2025

Testing methodology for AI crawler accessibility:

Manual tests:

  1. Disable JavaScript test:

    • Open your blog in browser
    • Disable JavaScript
    • What content is visible?
    • This approximates non-JS crawler view
  2. View source test:

    • View page source (not inspect element)
    • Is your content in the HTML?
    • Or is it just JavaScript placeholders?
  3. curl test:

    curl -A "GPTBot/1.0" https://yoursite.com/blog/
    
    • Does the response contain actual content?

Automated tests:

  1. Google Search Console:

    • URL Inspection tool
    • “View Rendered Page” shows what Googlebot sees
    • (Not AI crawlers, but similar JS rendering)
  2. Lighthouse audit:

    • Check “SEO” category
    • Crawlability issues flagged

What you want to see:

  • Content in initial HTML response
  • Links to all pages discoverable
  • No JS required for content visibility
E
EcommerceDevSEO · December 17, 2025

E-commerce perspective:

We have 10,000+ products with “load more” functionality. Here’s our solution:

Category page structure:

/category/shoes                    # First 24 products + load more
/category/shoes?page=2            # Products 25-48
/category/shoes?page=3            # Products 49-72

Implementation:

  1. Initial page always has pagination links

    • Even with infinite scroll enabled
    • Footer contains page 1, 2, 3… links
  2. ?page= parameters are canonical

    • Each page is its own content
    • Not duplicate of main page
  3. Sitemap includes all paginated URLs

    • Not just the infinite scroll base URL
  4. Products have individual URLs

    • Category pagination is for discovery
    • Products are the real content

Result:

AI platforms cite our individual product pages, which they discover through the paginated category structure.

FM
FrontendDev_Marcus OP Frontend Developer · December 17, 2025

This has been incredibly helpful. Here’s my implementation plan:

Approach: Hybrid pagination

Phase 1: Add paginated routes (Week 1-2)

  • Create /blog/archive/[page] routes
  • SSR for full content in HTML
  • Include pagination navigation
  • Update sitemap to include these

Phase 2: Update existing infinite scroll (Week 3)

  • Keep infinite scroll for /blog
  • Use archive pages as data source
  • Canonical from /blog to /blog/archive/1

Phase 3: Testing and validation (Week 4)

  • Test with JS disabled
  • curl tests for AI user agents
  • Monitor AI citation rates

Technical implementation:

/blog                 → Infinite scroll (humans, canonical to archive/1)
/blog/archive/1       → Paginated (crawlers, canonical to self)
/blog/archive/2       → Paginated (crawlers)
/blog/[slug]          → Individual articles (main content)

Key principles:

  • Content accessible without JavaScript
  • Every piece of content has a direct URL
  • Sitemap includes all content pages
  • Infinite scroll is enhancement, not requirement

Thanks everyone for the detailed technical guidance.

Have a Question About This Topic?

Get personalized help from our team. We'll respond within 24 hours.

Frequently Asked Questions

Can AI crawlers handle infinite scroll content?
Most AI crawlers have limited JavaScript rendering capabilities. Content that requires user interaction (scrolling) to load is often invisible to AI systems. Server-side rendering or hybrid approaches are recommended.
What's the best pagination approach for AI crawlers?
Traditional pagination with distinct URLs for each page is most AI-friendly. Each page should be accessible via direct URL, included in sitemap, and not require JavaScript to display content.
Do AI crawlers render JavaScript?
AI crawler JavaScript rendering varies significantly. GPTBot has limited JS capabilities. Some crawlers see only initial HTML. For AI visibility, critical content should be in initial server response, not JavaScript-loaded.
How can I test if AI crawlers can access my content?
Disable JavaScript and view your page - this approximates what many AI crawlers see. Also check robots.txt to ensure AI crawlers aren’t blocked, and verify content appears in initial HTML source.

Monitor Your Content's AI Visibility

Track which of your pages are being discovered and cited by AI platforms. Identify crawling issues affecting your visibility.

Learn more