Robots.txt for AI Search Engines
How to configure robots.txt for AI search engines like ChatGPT, Perplexity, and Claude
What is robots.txt for ai search engines?
Robots.txt is a text file at the root of your website that tells web crawlers which pages they can and cannot access. With the rise of AI search engines, robots.txt has become the primary way to control whether AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google), and PerplexityBot (Perplexity) can crawl your content. Each AI company has its own crawler user-agent, and you need specific directives for each one to control access.
Robots.txt for AI search engines is the practice of adding explicit User-agent directives for AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and others — to your robots.txt file, controlling whether each AI-powered search platform can crawl and use your content in AI-generated answers.
Why does robots.txt for ai search engines matter?
AI-powered search engines are rapidly becoming a primary way people find information. ChatGPT, Perplexity, and Google AI Overviews collectively handle billions of queries. If your robots.txt blocks these crawlers — or doesn't mention them at all — your content won't appear in AI-generated answers. Many websites inadvertently block AI crawlers with overly restrictive wildcard rules, losing visibility in the fastest-growing search channel without realizing it.
Key statistics
Over 45% of the top 1,000 websites now block at least one AI crawler via robots.txt, up from less than 5% in 2023.
Source: Originality.AI
ChatGPT alone processes over 1 billion queries per week, making AI search a significant traffic source for websites that allow crawler access.
Source: SimilarWeb
How to fix it
-
1
Check your current robots.txt file by visiting yourdomain.com/robots.txt and look for any User-agent directives that mention AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, PerplexityBot, CCBot, anthropic-ai, FacebookBot).
-
2
If you have a wildcard "Disallow: /" rule for User-agent: *, add explicit "Allow: /" directives for each AI crawler you want to permit.
-
3
Add separate User-agent blocks for each AI crawler with the appropriate Allow or Disallow directives based on your content strategy.
-
4
Test your changes using a robots.txt testing tool or the free eiSEO AI Crawler Audit at /tools/ai-crawler-audit to verify each crawler's access status.
-
5
Review your robots.txt quarterly as new AI crawlers emerge — the AI search landscape is changing rapidly.
Code example
# Bad: No AI-specific directives
# AI crawlers fall through to wildcard
User-agent: *
Disallow: /private/
Disallow: /admin/
# Result: AI crawlers have access, but you
# have no granular control over which ones
# Good: Explicit AI crawler directives
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: CCBot
Allow: /
User-agent: *
Disallow: /private/
Disallow: /admin/
Frequently asked questions
Related topics
llms.txt
llms.txt is a proposed standard file (placed at /llms.txt on your domain) that provides a structured, plain-text summary of your website specifically for large language models. While robots.txt controls whether AI crawlers can access your site, llms.txt helps them understand what your site is about, what content is most important, and how it's organized. Think of it as a "readme" for AI — a concise document that gives AI models the context they need to accurately represent your content in search results.
Structured Data
Structured data is machine-readable markup (typically JSON-LD using the Schema.org vocabulary) embedded in your page's HTML that explicitly describes the content's type, properties, and relationships. It tells search engines and AI systems exactly what your content is — an article, a product, a recipe, an FAQ — rather than requiring them to infer it from unstructured text.
Content Extractability
Content extractability measures how easily AI systems and web crawlers can parse, understand, and pull meaningful information from your page. Pages with clean semantic HTML, clear heading structure, well-organized sections, and content that is not locked behind JavaScript rendering or interactive widgets are highly extractable. Pages that rely on complex JavaScript frameworks, embed content in images or PDFs, or lack semantic structure are difficult for AI systems to process.
Check your AI crawler access with our free audit tool
eiSEO automatically detects and helps you fix issues like this across your entire site.