AI crawler robots.txt reference
A practical, source-linked reference for website owners reviewing AI crawler visibility. Last reviewed: June 26, 2026.
Safe guidance only
AI crawler and robots.txt controls to review
Crawler names and policies change. Use these as starting points, then verify each official source before publishing new rules.
OpenAI crawlers
OpenAI documents separate crawlers for ChatGPT search surfaces, model-training collection, and user-triggered requests.
OAI-SearchBotGPTBotChatGPT-UserReview OpenAI's current documentation before changing robots.txt because blocking search and training crawlers can have different visibility tradeoffs.
Google AI and crawler controls
Google documents crawler tokens and notes that Google-Extended is a control token for certain Gemini and Vertex AI uses, not a separate HTTP user agent.
Google-ExtendedGooglebotGoogleOtherGoogle-CloudVertexBotDo not assume Google-Extended blocks Google Search crawling. Check Google's crawler documentation before changing search-critical rules.
Anthropic Claude crawlers
Anthropic documents separate robots for model-development crawling, user-directed retrieval, and search-result quality.
ClaudeBotClaude-UserClaude-SearchBotBlocking user or search bots may reduce the ability for Claude to retrieve or surface public pages in user workflows.
Common Crawl
Common Crawl documents CCBot as its crawler for building public web crawl datasets.
CCBotUse the official CCBot reference and verification notes if you need to distinguish real Common Crawl requests from spoofed user agents.
Perplexity crawlers
Perplexity documents PerplexityBot as a crawler for surfacing and linking websites in Perplexity search results.
PerplexityBotReview the official crawler page and current IP guidance before making allow or disallow decisions.
Apple crawler controls
Apple documents Applebot for search-related crawling and Applebot-Extended as an additional control for how content may be used by Apple.
ApplebotApplebot-ExtendedUse Apple's own support page for current details because Applebot and Applebot-Extended can affect different purposes.
Meta crawlers
Meta documents crawler behavior for link previews and web crawling use cases, including Meta-ExternalAgent.
facebookexternalhitMeta-ExternalAgentCheck Meta's webmaster documentation before making crawler rules because preview, indexing, and AI-related use cases may differ.
Crawler control questions
Use official documentation and practical controls together; do not treat crawler policy as security.
Does robots.txt guarantee AI crawlers will stay away?
No. robots.txt is a public preference file for cooperative crawlers. It is useful, but private data should still be protected with authentication, access control, monitoring, and server-side controls.
Should I block every AI crawler?
Not automatically. Some crawler controls affect search visibility, user-triggered retrieval, model training, or product-specific features differently. Review each official source before changing rules.
Can DataCrawlPro review my AI crawler exposure?
Yes. DataCrawlPro can review public crawler visibility, robots.txt signals, llms.txt clarity, public structured data, and practical exposure controls.
Service pages connected to this resource
These pages explain how DataCrawlPro scopes public or authorized data extraction, Python scripts, scraping exposure audits, pricing, and contact review.
More public resources to cite or share
These resources are designed to be useful on their own: calculators, checklists, glossary entries, crawler references, and sample audit material.
Web Scraping Cost Calculator
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open ResourceWebsite Scraping Risk Checklist
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open ResourceAI Crawler robots.txt Reference
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open ResourcePublic Data Exposure Glossary
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open ResourceWeb Scraping Comparison Guides
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open ResourceSample Website Scraping Risk Audit Report
A public DataCrawlPro resource for planning, evaluation, responsible-use review, or website-owner education.
Open Resource
