AI Visibility High severity

Robots.txt for AI Search Engines

How to configure robots.txt for AI search engines like ChatGPT, Perplexity, and Claude

By eiSEO Team · Published Mar 3, 2026

What is robots.txt for ai search engines?

Robots.txt is a text file at the root of your website that tells web crawlers which pages they can and cannot access. With the rise of AI search engines, robots.txt has become the primary way to control whether AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google), and PerplexityBot (Perplexity) can crawl your content. Each AI company has its own crawler user-agent, and you need specific directives for each one to control access.

Robots.txt for AI search engines is the practice of adding explicit User-agent directives for AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and others — to your robots.txt file, controlling whether each AI-powered search platform can crawl and use your content in AI-generated answers.

Why does robots.txt for ai search engines matter?

AI-powered search engines are rapidly becoming a primary way people find information. ChatGPT, Perplexity, and Google AI Overviews collectively handle billions of queries. If your robots.txt blocks these crawlers — or doesn't mention them at all — your content won't appear in AI-generated answers. Many websites inadvertently block AI crawlers with overly restrictive wildcard rules, losing visibility in the fastest-growing search channel without realizing it.

Key statistics

Over 45% of the top 1,000 websites now block at least one AI crawler via robots.txt, up from less than 5% in 2023.

Source: Originality.AI

ChatGPT alone processes over 1 billion queries per week, making AI search a significant traffic source for websites that allow crawler access.

Source: SimilarWeb

How to fix it

1
Check your current robots.txt file by visiting yourdomain.com/robots.txt and look for any User-agent directives that mention AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, PerplexityBot, CCBot, anthropic-ai, FacebookBot).
2
If you have a wildcard "Disallow: /" rule for User-agent: *, add explicit "Allow: /" directives for each AI crawler you want to permit.
3
Add separate User-agent blocks for each AI crawler with the appropriate Allow or Disallow directives based on your content strategy.
4
Test your changes using a robots.txt testing tool or the free eiSEO AI Crawler Audit at /tools/ai-crawler-audit to verify each crawler's access status.
5
Review your robots.txt quarterly as new AI crawlers emerge — the AI search landscape is changing rapidly.

Code example

Bad

# Bad: No AI-specific directives
# AI crawlers fall through to wildcard
User-agent: *
Disallow: /private/
Disallow: /admin/

# Result: AI crawlers have access, but you
# have no granular control over which ones

Good

# Good: Explicit AI crawler directives
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

User-agent: *
Disallow: /private/
Disallow: /admin/

Frequently asked questions

At minimum, allow GPTBot (OpenAI/ChatGPT), ChatGPT-User (ChatGPT live browsing), Google-Extended (Google AI Overviews), ClaudeBot (Anthropic/Claude), and PerplexityBot (Perplexity search). These represent the largest AI search platforms. Also consider CCBot (Common Crawl, used to train many models) and anthropic-ai (secondary Anthropic crawler).

No. AI crawler user-agents are separate from traditional search bot user-agents (Googlebot, Bingbot). Blocking GPTBot does not affect your Google organic rankings. However, blocking Google-Extended does prevent Google from using your content in AI Overviews while keeping your regular search listings unaffected.

Review quarterly. The AI search landscape is evolving rapidly, with new crawlers appearing regularly. Major events like new AI search products (e.g., when Perplexity launched, or when OpenAI released SearchGPT) introduce new user-agents that need explicit rules.

Related topics

Medium

llms.txt

llms.txt is a proposed standard file (placed at /llms.txt on your domain) that provides a structured, plain-text summary of your website specifically for large language models. While robots.txt controls whether AI crawlers can access your site, llms.txt helps them understand what your site is about, what content is most important, and how it's organized. Think of it as a "readme" for AI — a concise document that gives AI models the context they need to accurately represent your content in search results.

Critical

Structured Data

Structured data is machine-readable markup (typically JSON-LD using the Schema.org vocabulary) embedded in your page's HTML that explicitly describes the content's type, properties, and relationships. It tells search engines and AI systems exactly what your content is — an article, a product, a recipe, an FAQ — rather than requiring them to infer it from unstructured text.

Medium

Content Extractability

Content extractability measures how easily AI systems and web crawlers can parse, understand, and pull meaningful information from your page. Pages with clean semantic HTML, clear heading structure, well-organized sections, and content that is not locked behind JavaScript rendering or interactive widgets are highly extractable. Pages that rely on complex JavaScript frameworks, embed content in images or PDFs, or lack semantic structure are difficult for AI systems to process.

Check your AI crawler access with our free audit tool

eiSEO automatically detects and helps you fix issues like this across your entire site.

Get Started Free

or try the free AI Crawler Audit tool