SameDayDesk · Guide/Comparison · June 2026
Six bots decide whether ChatGPT, Claude, Perplexity and Google AI Overviews can read and cite your pages. Here is who runs each one, what it is for, and the exact robots.txt to let them in.
To allow every major AI crawler — training and live retrieval — drop this at the top of https://yoursite.com/robots.txt:
User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml
The one mistake that quietly kills your citations: people block GPTBot to opt out of AI training, then assume their pages still show up in ChatGPT answers. They do not necessarily. The bot that fetches a page live to answer a ChatGPT query and attach a citation is OAI-SearchBot (and ChatGPT-User for user-triggered browsing). Perplexity citations come through PerplexityBot. Block those retrieval bots and you have blocked your own citations — not your training exposure.
Live retrieval (RAG) is independent of training data. A page indexed an hour ago can be cited today without ever being in a model's training set. That is the whole game in 2026: get the retrieval crawlers in, get indexed fast, and you are eligible to be cited.
Each row is a separate User-agent in robots.txt. Allowing one does not allow the others — they are independent tokens.
| Crawler | Operator | What it is for | Allow directive |
|---|---|---|---|
GPTBot |
OpenAI | Training crawl (model training data) | User-agent: GPTBotAllow: / |
OAI-SearchBot |
OpenAI | Live retrieval & citation for ChatGPT Search | User-agent: OAI-SearchBotAllow: / |
ChatGPT-User |
OpenAI | User-triggered fetch when ChatGPT browses on demand | User-agent: ChatGPT-UserAllow: / |
ClaudeBot |
Anthropic | Training crawl for Claude | User-agent: ClaudeBotAllow: / |
PerplexityBot |
Perplexity | Indexing & live retrieval/citation | User-agent: PerplexityBotAllow: / |
Google-Extended |
Gemini training opt-in (NOT Search indexing) | User-agent: Google-ExtendedAllow: / |
Note on Google: Google-Extended only controls whether your content trains Gemini. It does not control whether you appear in Google Search or Google AI Overviews — that is governed by the normal Googlebot crawl. So blocking Google-Extended does not remove you from AI Overviews.
Mentally split the list in two:
GPTBot, ClaudeBot, Google-Extended. Blocking these is a content-licensing / opt-out decision. It has near-zero effect on whether you get cited today.OAI-SearchBot, ChatGPT-User, PerplexityBot. These fetch your page in real time to build an answer and a citation. Blocking these directly removes you from AI answers.Across 189 well-known companies we benchmarked in June 2026, AI-crawler access was the single most common failure point — and the most misunderstood. Allowing GPTBot while blocking OAI-SearchBot is the configuration equivalent of unlocking the front door but bricking up the one people actually walk through.
Letting the bots in is necessary but not sufficient — you also need to be in the index they read from.
ChatGPT Search and Microsoft Copilot largely retrieve from the Bing index. About 87% of SearchGPT citations matched Bing's top-20 results (Seer Interactive, reconfirmed by Search Engine Land in April 2026). The practical takeaway: getting into Bing — which you can push instantly with IndexNow, no account required — is the fast lane to ChatGPT visibility. Bing typically indexes new content in hours to days.
Perplexity runs its own crawler and index and tends to favor fresher pages. On low-competition queries, first citations can show up within days of a page going live.
Google AI Overviews plays by different rules. AIO citation is largely independent of organic rank — about 68% of AIO-cited pages were not even in the top 10 organic results. So you do not need to win the classic ranking war to get cited; you need to be crawlable, well-structured, and answer the question directly. (Google still sandboxes brand-new domains for roughly 3–9 months on commercial queries, so do not gate your AI-visibility plan on Google.)
Once a retrieval crawler can reach your page, the GEO research (Princeton / Georgia Tech, KDD 2024) shows exactly what makes it more likely to be quoted:
| Tactic | Visibility lift |
|---|---|
| Add direct quotations | about +41% |
| Add statistics / hard numbers | about +32% |
| Cite named sources | about +30% |
| Answer-first structure | about 44% of LLM citations come from the first 30% of a page |
Comparison, "X vs Y", "alternatives to", and listicle formats are among the most-cited by AI on commercial-intent queries — roughly 40.9% of citations on such queries. And AI-search referral traffic reportedly converts about 4.4x organic search traffic, which is why the access setup is worth getting exactly right.
One thing you can skip: llms.txt. Google (Gary Illyes) has said it is not supported and not planned, and about 97% of LLM crawler hits never fetch it (Ahrefs). Treat it as optional hygiene, not a ranking factor. The same goes for FAQ and HowTo schema — Google removed those rich results between 2023 and 2026. For AI extraction, the schema that earns its keep is Organization + Article for content, SoftwareApplication for a tool, and Service + Offer for a paid product.
Run the free AI Readiness Checker. It scans your site for AI-crawler access (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), JSON-LD, titles/meta, Open Graph, sitemap and more, and scores you 0–100 against our 189-company benchmark — in your browser, no signup.
Run the free check Get the $9 AI Readiness KitPrefer the terminal? npx github:epistemedeus/ai-readiness yoursite.com — the same scan, open source (repo).
Disallow under OAI-SearchBot, ChatGPT-User or PerplexityBot — those are blocking your citations. Replace the block with the copy-paste directive above.Want it done for you? The $9 AI Readiness Kit (instant download) bundles the full 189-company benchmark, every robots.txt and JSON-LD template, and the checklist. The $39 Fix Pack (built for your exact site, same day) hands you ready-to-paste files. And the $249 AI-Search Visibility Audit (real citation testing vs your named competitors) tells you who is getting cited on your money queries today, and why.
Want the raw scores? Our June-2026 benchmark of 189 companies across 10 industries is open data (CC-BY): ai-search-readiness-2026.csv. Industry averages ran from marketing agencies at 92/100 down to healthtech at 63/100 — and some names you would not expect scored low: OpenAI and GitHub landed a D on JS-heavy homepages, Perplexity a C, and Klarna an F at 38, the lowest in the set.