Do news sites block AI crawlers on purpose?

Often yes. Many major publishers (e.g., The New York Times, The Verge, TechCrunch) block GPTBot, ClaudeBot, PerplexityBot or Google-Extended in robots.txt as part of copyright and licensing disputes with AI companies. That deliberately lowers their AI-search readiness — it is a strategy, not a mistake.

What is 'accidental' AI invisibility?

A site that allows AI crawlers but ships no structured data, no clean title/meta, and no sitemap, so AI engines that can crawl it still find little to classify or quote. In our scan, Ars Technica allowed crawlers but had none of those fundamentals and scored an F — invisible to AI search despite not blocking anyone.

How can a publisher that wants AI traffic improve?

If you want AI search referral traffic, allow the crawlers you're comfortable with, then add NewsArticle/Organization JSON-LD, clean titles and descriptions, Open Graph tags, and a sitemap. Check your own site free at samedaydesk.com/tools/ai-readiness.

SameDayDesk · Report · June 2026

News publishers and AI search: who's blocking the robots, and who's invisible by accident.

Publishers and AI companies are locked in a fight over whether AI can read and cite the news. So we checked the robots.txt and structured data of 10 major news sites. They split into two camps — and the second one is doing it to itself.

10news sites scanned

64average score /100

2camps: blocked vs invisible

Flowest (Ars Technica)

The results

Scored June 23, 2026 with the free AI Readiness Checker. "Why" notes the dominant reason for the score.

Publisher	Score	Grade	Dominant reason
techcrunch.com	80	B	Blocks GPTBot/ClaudeBot (deliberate); everything else clean.
nytimes.com	73	C	Blocks GPTBot/OAI-SearchBot (deliberate); strong schema + sitemap.
theverge.com	73	C	Blocks ClaudeBot/PerplexityBot/Google-Extended (deliberate); good otherwise.
wired.com	73	C	Mixed crawler blocks; some structural gaps.
businessinsider.com	73	C	Mixed crawler blocks; some structural gaps.
forbes.com	73	C	Mixed crawler blocks; some structural gaps.
bbc.com	69	C	Crawler restrictions + structural gaps.
bloomberg.com	43	D	Blocks GPTBot + no JSON-LD/OG; served an "Are you a robot?" page to our fetch.
theguardian.com	43	D	Crawler blocks + missing structured data.
arstechnica.com	38	F	Allows all crawlers but ships no JSON-LD, no title/meta, no OG, no sitemap — accidentally invisible.

Two very different problems

Camp 1 — deliberate blockers. The New York Times, The Verge, and TechCrunch block one or more AI crawlers in robots.txt. That's a strategy in the copyright/licensing fight, not an oversight — and their other fundamentals (structured data, sitemaps, Open Graph) are largely in place. Their lower "AI readiness" score is a choice.

Camp 2 — accidentally invisible. The more interesting case is Ars Technica: it blocks nothing, so AI crawlers are welcome — yet it ships no structured data, no title/meta, no Open Graph, and no sitemap, so an engine that crawls it finds almost nothing to quote. It scored an F not by choice but by neglect. Bloomberg manages to do both: it blocks GPTBot and lacks structured data and served an anti-bot "Are you a robot?" page to a plain request.

The lesson for any publisher that actually wants AI-search referral traffic: blocking is a deliberate lever you can pull or not — but missing structured data, titles, and a sitemap just throws away visibility for free.

Want AI search to find you (and not by accident)?

If you want AI engines to surface you but you're invisible by neglect, the Fix Pack ($39) ships JSON-LD structured data, clean title/meta/Open Graph tags, a sitemap, and a crawler policy that welcomes the engines you choose — installed-ready, today. Check where you stand first with the free AI Readiness Checker.

Run the free check Get the Fix Pack · $39

Method: each publisher's homepage was fetched once and scored on AI-crawler access, JSON-LD, title/meta, Open Graph, XML sitemap, and llms.txt. Crawler-blocking is reported neutrally — for publishers it is often a deliberate, reasonable choice. Point-in-time, homepage-level snapshot; scores change as sites update. Run your own at samedaydesk.com/tools/ai-readiness.