
Google-Extended
Learn about Google-Extended, the user-agent token that lets publishers control whether their content is used for AI training in Gemini and Vertex AI. Understand...

Learn what Google-Extended is, how it works, and whether you should block it in your robots.txt. Understand the difference between AI training control and AI Overviews.
Google-Extended is a standalone product token announced by Google on September 28, 2023, that gives web publishers granular control over whether their content can be used to train and improve Google’s generative AI models, specifically Bard and Vertex AI. This new control mechanism represents a significant shift in how Google approaches AI transparency and publisher consent, allowing website administrators to make informed decisions about their content’s role in AI development. By implementing Google-Extended through the robots.txt file, publishers can now choose whether to contribute to the improvement of current and future generations of AI models that power Google’s products. The announcement came in response to growing concerns from the web publishing community about how their content was being utilized for AI training without explicit opt-in mechanisms.

Google-Extended operates as a machine-readable control that functions through the industry-standard robots.txt file, making it accessible to publishers of all technical skill levels. When you add the Google-Extended user-agent directive to your robots.txt file, you’re essentially communicating with Google’s AI training crawlers about which content should be excluded from their indexing process. The implementation is straightforward and follows the same conventions that publishers have used for decades to manage search engine crawlers. Here are the two primary implementation approaches:
# Full block of Google-Extended
User-agent: Google-Extended
Disallow: /
# Partial block - only specific directories
User-agent: Google-Extended
Disallow: /premium-content/
Disallow: /licensed-material/
The first example prevents Google-Extended from accessing any content on your site, while the second demonstrates selective blocking of specific directories or content types. This flexibility allows publishers to maintain a nuanced approach, potentially allowing AI training on general content while protecting sensitive or proprietary material.
Understanding the scope of what Google-Extended controls is crucial for making informed decisions about implementation. The directive specifically prevents Google’s AI training crawlers from accessing your content for the purpose of improving Bard, Vertex AI, and future generative AI products. However, it’s important to recognize that Google-Extended has specific limitations and doesn’t control all AI-related access to your content. Here’s a detailed comparison:
| Feature | Blocked by Google-Extended | NOT Blocked |
|---|---|---|
| Bard training data collection | ✓ Yes | — |
| Vertex AI model improvement | ✓ Yes | — |
| Future Google AI models | ✓ Yes | — |
| Google Search indexing | — | ✓ Not affected |
| AI Overviews in Search results | — | ✓ Not affected |
| Google Search rankings | — | ✓ Not affected |
| Googlebot crawling | — | ✓ Not affected |
| Regular search visibility | — | ✓ Not affected |
This distinction is critical: blocking Google-Extended does not prevent your content from appearing in Google Search results or from being used in AI Overviews. It specifically targets only the training data collection for Google’s generative AI products, leaving your search visibility completely intact.

One of the most misunderstood aspects of Google-Extended is its relationship to AI Overviews, Google’s feature that displays AI-generated summaries at the top of search results. Many publishers mistakenly believe that blocking Google-Extended will prevent their content from appearing in AI Overviews, but this is fundamentally incorrect. AI Overviews are generated from content that appears in Google Search results, not from the separate AI training data collection that Google-Extended controls. This means that even if you block Google-Extended, your content can still be cited and summarized in AI Overviews if it ranks well in traditional search results. If your primary concern is preventing content from appearing in AI Overviews, Google offers an alternative approach: the nosnippet meta tag, which prevents Google from displaying snippets of your content in any search results, including AI Overviews. Understanding this distinction is essential for developing an effective content protection strategy that aligns with your business objectives.
The decision to block Google-Extended should be based on a careful analysis of your content’s value and your business model. Certain types of publishers and content creators have particularly compelling reasons to implement this restriction:
Licensed Content Providers: Publishers who have licensed content from third parties with specific usage restrictions should block Google-Extended to ensure compliance with licensing agreements and avoid potential legal liability.
Premium and Subscription-Based Content: News organizations, research platforms, and educational institutions that monetize exclusive content through subscriptions benefit from preventing that content from being used to train competing AI systems.
Intellectual Property-Heavy Content: Companies producing original research, proprietary methodologies, or specialized knowledge should consider blocking to protect their competitive advantage and maintain the uniqueness of their offerings.
Legal and Compliance-Sensitive Industries: Financial services, healthcare, and legal firms may need to block Google-Extended to comply with industry regulations and maintain client confidentiality standards.
Creative Industries: Authors, photographers, musicians, and other creative professionals who depend on copyright protection and fair compensation for their work have legitimate reasons to restrict AI training access.
Real-world adoption of Google-Extended reveals interesting patterns about how different publishers view AI training access. Major news organizations have taken a protective stance: The New York Times, CNN, and the BBC have all implemented Google-Extended blocks, reflecting concerns about their premium journalism being used to train competing AI systems without compensation. These decisions align with broader industry discussions about fair compensation for content used in AI training. Conversely, other major publishers have chosen not to block Google-Extended, including Wikipedia, CNET, and Netflix, suggesting different strategic priorities or business models. According to data from Reuters and industry tracking, the adoption rate varies significantly by industry, with news publishers showing higher blocking rates than technology, entertainment, and reference sites. This divergence reflects the different economic models and content strategies across industries, with some publishers viewing AI training access as a potential benefit for discoverability while others see it as a threat to their core business.
A critical point that Google has explicitly confirmed is that blocking Google-Extended has absolutely no impact on your search rankings or visibility in Google Search results. This official statement from Google is fundamental to understanding the true scope of this control mechanism. Your site’s inclusion in Google’s search index, your ranking positions for target keywords, and your organic search traffic remain completely unaffected by whether you block Google-Extended. This separation of concerns is intentional: Google maintains that the crawlers responsible for search indexing (Googlebot) operate independently from the AI training crawlers that Google-Extended controls. Publishers should feel confident that implementing Google-Extended restrictions is purely a content usage decision that doesn’t carry search visibility penalties. This clarity is important because it allows publishers to make blocking decisions based solely on their content protection and business strategy concerns, rather than worrying about negative SEO consequences.
Deciding whether to block Google-Extended ultimately comes down to a fundamental business question: Is your revenue model based on monetizing trust or monetizing content? Publishers must analyze whether allowing their content to improve Google’s AI products provides strategic value through increased visibility and traffic, or whether it represents a threat to their core revenue streams. For publishers whose business model depends on exclusive, premium content—such as subscription-based news organizations or research platforms—blocking Google-Extended protects their ability to charge for access to unique information. Conversely, publishers who rely on advertising revenue and organic traffic may benefit from allowing Google-Extended access, as improved AI models could drive more qualified traffic to their sites. The landscape is further complicated by the emergence of Google Assistant and Gemini, which represent the future of how Google will deliver information to users. As these AI interfaces become more sophisticated and prevalent, the question of whether your content should power them becomes increasingly strategic. Publishers must consider not just current revenue implications but also how their content strategy will evolve as AI-powered interfaces become the primary way users discover information.
The concept of grounding is central to understanding the future of AI-powered search and information discovery. Grounding refers to the practice of anchoring AI-generated responses to specific, cited sources from the web, ensuring that AI outputs are factually accurate and traceable. Google’s Deep Research feature and other advanced AI capabilities rely heavily on grounding to provide users with reliable, sourced information. As AI assistants become more sophisticated, the ability to cite and reference authoritative sources becomes increasingly valuable—both for users seeking trustworthy information and for publishers whose content serves as the foundation for these responses. The future of AI interfaces will likely involve more direct engagement with publisher content, potentially creating new opportunities for visibility and traffic. Publishers who understand and prepare for this shift—whether through strategic blocking decisions or by optimizing their content for AI consumption—will be better positioned to thrive in an AI-driven information landscape.
Implementing Google-Extended controls is straightforward, but proper monitoring ensures your directives are being respected. To implement the block, simply add the Google-Extended user-agent directive to your robots.txt file and deploy it to your web server. You can verify implementation by checking your robots.txt file directly in a browser (typically at yoursite.com/robots.txt) to confirm the directive is present and properly formatted. Google Search Console provides limited visibility into Google-Extended crawling, though it’s less detailed than standard Googlebot reporting. To monitor the effects of blocking Google-Extended, establish baseline metrics before implementation: track your organic search traffic, rankings for target keywords, and any changes in how your content appears in search results and AI Overviews. After implementing the block, monitor these metrics over time to ensure your search visibility remains unaffected. Additionally, consider setting up alerts for mentions of your brand or content in AI-generated responses to understand how your content is being used in AI contexts. Regular audits of your robots.txt file and periodic reviews of your blocking strategy ensure your directives remain aligned with your evolving business objectives and competitive landscape.
Google-Extended is a robots.txt control mechanism announced in September 2023 that allows website owners to prevent Google from using their content to train Gemini models and for grounding in Gemini apps. It's not a separate crawler but a control token that uses existing Google user agents.
No. AI Overviews are part of Google Search, not controlled by Google-Extended. To block AI Overviews, you must use the nosnippet meta tag, but this also blocks regular search snippets and visibility.
No. Google officially states that Google-Extended does not impact search inclusion or ranking. It only affects whether your content is used for Gemini training and grounding.
Add these lines to your robots.txt file: user-agent: Google-Extended followed by Disallow: / to block all content, or Disallow: /directory to block specific sections.
It depends on your business model. If you monetize trust and expertise, allowing it may increase visibility. If you monetize the content itself (paywalled articles), blocking may protect your IP.
Grounding is when Gemini pulls content from Google Search to fact-check or enrich its responses, then shows those sources as citations. Blocking Google-Extended prevents your site from appearing as a grounding source.
Major news publishers like NYT, CNN, and BBC block it. However, many large sites like Wikipedia, Netflix, LinkedIn, and WebMD do not block it.
No. Google-Extended only affects Gemini training and grounding. It doesn't impact Google News, Google Images, or any other Google Search features.
Google-Extended is just one way AI systems access your content. AmICited tracks how AI answers across Google AI Overviews, Gemini, and Perplexity reference your brand and content.

Learn about Google-Extended, the user-agent token that lets publishers control whether their content is used for AI training in Gemini and Vertex AI. Understand...

Learn about Applebot-Extended, Apple's web crawler for AI training. Understand how it evaluates content for Apple Intelligence, how to block it, and your privac...

Google Bard is a conversational AI service powered by LaMDA and PaLM 2 models. Learn how this AI chatbot works, its capabilities, and its transition to Gemini.