SameDayDesk · Report · June 2026
Publishers and AI companies are locked in a fight over whether AI can read and cite the news. So we checked the robots.txt and structured data of 10 major news sites. They split into two camps — and the second one is doing it to itself.
Scored June 23, 2026 with the free AI Readiness Checker. "Why" notes the dominant reason for the score.
| Publisher | Score | Grade | Dominant reason |
|---|---|---|---|
| techcrunch.com | 80 | B | Blocks GPTBot/ClaudeBot (deliberate); everything else clean. |
| nytimes.com | 73 | C | Blocks GPTBot/OAI-SearchBot (deliberate); strong schema + sitemap. |
| theverge.com | 73 | C | Blocks ClaudeBot/PerplexityBot/Google-Extended (deliberate); good otherwise. |
| wired.com | 73 | C | Mixed crawler blocks; some structural gaps. |
| businessinsider.com | 73 | C | Mixed crawler blocks; some structural gaps. |
| forbes.com | 73 | C | Mixed crawler blocks; some structural gaps. |
| bbc.com | 69 | C | Crawler restrictions + structural gaps. |
| bloomberg.com | 43 | D | Blocks GPTBot + no JSON-LD/OG; served an "Are you a robot?" page to our fetch. |
| theguardian.com | 43 | D | Crawler blocks + missing structured data. |
| arstechnica.com | 38 | F | Allows all crawlers but ships no JSON-LD, no title/meta, no OG, no sitemap — accidentally invisible. |
Camp 1 — deliberate blockers. The New York Times, The Verge, and TechCrunch block one or more AI crawlers in robots.txt. That's a strategy in the copyright/licensing fight, not an oversight — and their other fundamentals (structured data, sitemaps, Open Graph) are largely in place. Their lower "AI readiness" score is a choice.
Camp 2 — accidentally invisible. The more interesting case is Ars Technica: it blocks nothing, so AI crawlers are welcome — yet it ships no structured data, no title/meta, no Open Graph, and no sitemap, so an engine that crawls it finds almost nothing to quote. It scored an F not by choice but by neglect. Bloomberg manages to do both: it blocks GPTBot and lacks structured data and served an anti-bot "Are you a robot?" page to a plain request.
The lesson for any publisher that actually wants AI-search referral traffic: blocking is a deliberate lever you can pull or not — but missing structured data, titles, and a sitemap just throws away visibility for free.
If you want AI engines to surface you but you're invisible by neglect, the Fix Pack ($39) ships JSON-LD structured data, clean title/meta/Open Graph tags, a sitemap, and a crawler policy that welcomes the engines you choose — installed-ready, today. Check where you stand first with the free AI Readiness Checker.
Run the free check Get the Fix Pack · $39Method: each publisher's homepage was fetched once and scored on AI-crawler access, JSON-LD, title/meta, Open Graph, XML sitemap, and llms.txt. Crawler-blocking is reported neutrally — for publishers it is often a deliberate, reasonable choice. Point-in-time, homepage-level snapshot; scores change as sites update. Run your own at samedaydesk.com/tools/ai-readiness.