Does blocking GPTBot stop ChatGPT from citing my site?

No. GPTBot is the training crawler. Live ChatGPT citations come through OAI-SearchBot and ChatGPT-User. Block those two and you lose citations even if GPTBot is allowed.

What is the difference between a training crawler and a retrieval crawler?

Training crawlers (GPTBot, ClaudeBot, Google-Extended) collect text to train models. Retrieval crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot) fetch pages live to answer a query and produce a citation. Live retrieval is independent of training data, so a freshly indexed page can be cited without ever being in a training set.

SameDayDesk · Guide/Comparison · June 2026

AI crawler list 2026: GPTBot, ClaudeBot, PerplexityBot, Google-Extended and how to allow them

Six bots decide whether ChatGPT, Claude, Perplexity and Google AI Overviews can read and cite your pages. Here is who runs each one, what it is for, and the exact robots.txt to let them in.

6AI crawlers that matter in 2026

87%of SearchGPT citations matched Bing top-20 (Seer Interactive)

68%of AI Overviews citations were NOT in the top 10 organic results

4.4xAI-search referral traffic converts vs organic search

The answer: copy-paste this robots.txt block

To allow every major AI crawler — training and live retrieval — drop this at the top of https://yoursite.com/robots.txt:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

The one mistake that quietly kills your citations: people block GPTBot to opt out of AI training, then assume their pages still show up in ChatGPT answers. They do not necessarily. The bot that fetches a page live to answer a ChatGPT query and attach a citation is OAI-SearchBot (and ChatGPT-User for user-triggered browsing). Perplexity citations come through PerplexityBot. Block those retrieval bots and you have blocked your own citations — not your training exposure.

Live retrieval (RAG) is independent of training data. A page indexed an hour ago can be cited today without ever being in a model's training set. That is the whole game in 2026: get the retrieval crawlers in, get indexed fast, and you are eligible to be cited.

The 2026 AI crawler list

Each row is a separate User-agent in robots.txt. Allowing one does not allow the others — they are independent tokens.

Crawler	Operator	What it is for	Allow directive
`GPTBot`	OpenAI	Training crawl (model training data)	`User-agent: GPTBot` `Allow: /`
`OAI-SearchBot`	OpenAI	Live retrieval & citation for ChatGPT Search	`User-agent: OAI-SearchBot` `Allow: /`
`ChatGPT-User`	OpenAI	User-triggered fetch when ChatGPT browses on demand	`User-agent: ChatGPT-User` `Allow: /`
`ClaudeBot`	Anthropic	Training crawl for Claude	`User-agent: ClaudeBot` `Allow: /`
`PerplexityBot`	Perplexity	Indexing & live retrieval/citation	`User-agent: PerplexityBot` `Allow: /`
`Google-Extended`	Google	Gemini training opt-in (NOT Search indexing)	`User-agent: Google-Extended` `Allow: /`

Note on Google: Google-Extended only controls whether your content trains Gemini. It does not control whether you appear in Google Search or Google AI Overviews — that is governed by the normal Googlebot crawl. So blocking Google-Extended does not remove you from AI Overviews.

Training crawlers vs retrieval crawlers — the distinction that decides citations

Mentally split the list in two:

Training crawlers — GPTBot, ClaudeBot, Google-Extended. Blocking these is a content-licensing / opt-out decision. It has near-zero effect on whether you get cited today.
Retrieval crawlers — OAI-SearchBot, ChatGPT-User, PerplexityBot. These fetch your page in real time to build an answer and a citation. Blocking these directly removes you from AI answers.

Across 189 well-known companies we benchmarked in June 2026, AI-crawler access was the single most common failure point — and the most misunderstood. Allowing GPTBot while blocking OAI-SearchBot is the configuration equivalent of unlocking the front door but bricking up the one people actually walk through.

Where ChatGPT and Perplexity actually pull from

Letting the bots in is necessary but not sufficient — you also need to be in the index they read from.

ChatGPT Search and Microsoft Copilot largely retrieve from the Bing index. About 87% of SearchGPT citations matched Bing's top-20 results (Seer Interactive, reconfirmed by Search Engine Land in April 2026). The practical takeaway: getting into Bing — which you can push instantly with IndexNow, no account required — is the fast lane to ChatGPT visibility. Bing typically indexes new content in hours to days.

Perplexity runs its own crawler and index and tends to favor fresher pages. On low-competition queries, first citations can show up within days of a page going live.

Google AI Overviews plays by different rules. AIO citation is largely independent of organic rank — about 68% of AIO-cited pages were not even in the top 10 organic results. So you do not need to win the classic ranking war to get cited; you need to be crawlable, well-structured, and answer the question directly. (Google still sandboxes brand-new domains for roughly 3–9 months on commercial queries, so do not gate your AI-visibility plan on Google.)

Allowing the bots is step one. Being citable is step two.

Once a retrieval crawler can reach your page, the GEO research (Princeton / Georgia Tech, KDD 2024) shows exactly what makes it more likely to be quoted:

Tactic	Visibility lift
Add direct quotations	about +41%
Add statistics / hard numbers	about +32%
Cite named sources	about +30%
Answer-first structure	about 44% of LLM citations come from the first 30% of a page

Comparison, "X vs Y", "alternatives to", and listicle formats are among the most-cited by AI on commercial-intent queries — roughly 40.9% of citations on such queries. And AI-search referral traffic reportedly converts about 4.4x organic search traffic, which is why the access setup is worth getting exactly right.

One thing you can skip: llms.txt. Google (Gary Illyes) has said it is not supported and not planned, and about 97% of LLM crawler hits never fetch it (Ahrefs). Treat it as optional hygiene, not a ranking factor. The same goes for FAQ and HowTo schema — Google removed those rich results between 2023 and 2026. For AI extraction, the schema that earns its keep is Organization + Article for content, SoftwareApplication for a tool, and Service + Offer for a paid product.

Is your robots.txt actually letting the citation bots in?

Run the free AI Readiness Checker. It scans your site for AI-crawler access (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), JSON-LD, titles/meta, Open Graph, sitemap and more, and scores you 0–100 against our 189-company benchmark — in your browser, no signup.

Run the free check Get the $9 AI Readiness Kit

Prefer the terminal? npx github:epistemedeus/ai-readiness yoursite.com — the same scan, open source (repo).

What to do, in order

Open your robots.txt. Search for any Disallow under OAI-SearchBot, ChatGPT-User or PerplexityBot — those are blocking your citations. Replace the block with the copy-paste directive above.
Get into Bing. Submit your sitemap and ping IndexNow (no account needed). This is the shortest path to ChatGPT Search visibility, since about 87% of SearchGPT citations track Bing's top results.
Make pages citable. Lead with the answer, add stats, quote sources by name, and ship Article + Organization JSON-LD.
Verify. Re-run the free checker after each change to confirm every crawler is green.

Want it done for you? The $9 AI Readiness Kit (instant download) bundles the full 189-company benchmark, every robots.txt and JSON-LD template, and the checklist. The $39 Fix Pack (built for your exact site, same day) hands you ready-to-paste files. And the $249 AI-Search Visibility Audit (real citation testing vs your named competitors) tells you who is getting cited on your money queries today, and why.

Want the raw scores? Our June-2026 benchmark of 189 companies across 10 industries is open data (CC-BY): ai-search-readiness-2026.csv. Industry averages ran from marketing agencies at 92/100 down to healthtech at 63/100 — and some names you would not expect scored low: OpenAI and GitHub landed a D on JS-heavy homepages, Perplexity a C, and Klarna an F at 38, the lowest in the set.