SameDayDesk · Report · June 2026

News publishers and AI search: who's blocking the robots, and who's invisible by accident.

Publishers and AI companies are locked in a fight over whether AI can read and cite the news. So we checked the robots.txt and structured data of 10 major news sites. They split into two camps — and the second one is doing it to itself.

10news sites scanned
64average score /100
2camps: blocked vs invisible
Flowest (Ars Technica)

The results

Scored June 23, 2026 with the free AI Readiness Checker. "Why" notes the dominant reason for the score.

PublisherScoreGradeDominant reason
techcrunch.com80BBlocks GPTBot/ClaudeBot (deliberate); everything else clean.
nytimes.com73CBlocks GPTBot/OAI-SearchBot (deliberate); strong schema + sitemap.
theverge.com73CBlocks ClaudeBot/PerplexityBot/Google-Extended (deliberate); good otherwise.
wired.com73CMixed crawler blocks; some structural gaps.
businessinsider.com73CMixed crawler blocks; some structural gaps.
forbes.com73CMixed crawler blocks; some structural gaps.
bbc.com69CCrawler restrictions + structural gaps.
bloomberg.com43DBlocks GPTBot + no JSON-LD/OG; served an "Are you a robot?" page to our fetch.
theguardian.com43DCrawler blocks + missing structured data.
arstechnica.com38FAllows all crawlers but ships no JSON-LD, no title/meta, no OG, no sitemap — accidentally invisible.

Two very different problems

Camp 1 — deliberate blockers. The New York Times, The Verge, and TechCrunch block one or more AI crawlers in robots.txt. That's a strategy in the copyright/licensing fight, not an oversight — and their other fundamentals (structured data, sitemaps, Open Graph) are largely in place. Their lower "AI readiness" score is a choice.

Camp 2 — accidentally invisible. The more interesting case is Ars Technica: it blocks nothing, so AI crawlers are welcome — yet it ships no structured data, no title/meta, no Open Graph, and no sitemap, so an engine that crawls it finds almost nothing to quote. It scored an F not by choice but by neglect. Bloomberg manages to do both: it blocks GPTBot and lacks structured data and served an anti-bot "Are you a robot?" page to a plain request.

The lesson for any publisher that actually wants AI-search referral traffic: blocking is a deliberate lever you can pull or not — but missing structured data, titles, and a sitemap just throws away visibility for free.

Want AI search to find you (and not by accident)?

If you want AI engines to surface you but you're invisible by neglect, the Fix Pack ($39) ships JSON-LD structured data, clean title/meta/Open Graph tags, a sitemap, and a crawler policy that welcomes the engines you choose — installed-ready, today. Check where you stand first with the free AI Readiness Checker.

Run the free check Get the Fix Pack · $39

Method: each publisher's homepage was fetched once and scored on AI-crawler access, JSON-LD, title/meta, Open Graph, XML sitemap, and llms.txt. Crawler-blocking is reported neutrally — for publishers it is often a deliberate, reasonable choice. Point-in-time, homepage-level snapshot; scores change as sites update. Run your own at samedaydesk.com/tools/ai-readiness.