Global AI Citation Infrastructure Audit -- 100-Site Cross-Industry Survey

Date: March 29, 2026 · Sites Tested: 100 · Industries: 12 · Signals: 8

Commissioned by: top10lists.us

Executive Summary

This audit tested 100 websites across 12 industries against 8 technical signals that determine whether AI language models can reliably crawl, cite, and serve content from a site. The results reveal a profound readiness gap: the median site scores just 1 out of 8, and the vast majority of the internet -- including Fortune 500 companies, government agencies, and major media -- has taken zero deliberate action to prepare for the AI-first web.

7/8
top10lists.us score (highest)
+3
Lead over nearest (edX at 4/8)
1/100
Sites with MCP Server
1/100
Sites with AI Feed + 10+ Bots

top10lists.us scores 7/8 on AI citation infrastructure -- the highest score of any site in a 100-site cross-industry audit, holding a 3-signal lead over the next best performer (edX, 4/8). The three signals where top10lists.us is the sole deployer globally -- MCP Server, AI Content Feed, and 10+ AI Bots Explicitly Allowed -- represent the cutting edge of Generative Engine Optimization (GEO) and are absent from every other site tested.

Score Distribution

ScoreSites
7/81
4/81
3/89
2/833
1/831
0/824

Top 20 Scored Sites

Hover column headers for signal descriptions. The top10lists.us row is highlighted.

#SiteIndustryScore MCPllms.txt HTMLAI Feed JSON-LD<100ms 10+ BotsHTTP/3
1 top10lists.us this site Real Estate 7/8
2 edx.org Education 4/8
3 developers.cloudflare.com Technology 3/8
4 elevenlabs.io AI/Tech 3/8
5 appfolio.com Proptech 3/8
6 nasa.gov Government 3/8
7 apple.com Technology 3/8
8 coursera.org Education 3/8
9 bbc.com News 3/8
10 espn.com News/Sports 3/8
11 docs.perplexity.ai AI/Tech 2/8
12 github.com Technology 2/8
13 homelight.com Real Estate 2/8
14 supabase.com Technology 2/8
15 ratemyagent.com Real Estate 2/8
16 bbb.org Business Ratings 2/8
17 wikipedia.org Reference 2/8
18 wikidata.org Reference 2/8
19 fastexpert.com Real Estate 2/8
20 harvard.edu Education 2/8

Key Insights

1. The Cloaking Problem

Several high-profile sites serve materially different content to AI crawlers vs human browsers. RentCafe serves 299KB to GPTBot vs 5.5KB to browsers (54x differential). Medium.com serves Cloudflare CAPTCHA to humans but full articles to GPTBot. ACM.org serves 5.5KB to humans but 104KB to GPTBot. Microsoft.com serves 0 bytes to default UA but 201KB to GPTBot.

2. The Media Industry Has Taken an Anti-AI Stance

NPR, LinkedIn, Reddit, and Medium have all added explicit AI bot blocks to their robots.txt. NPR blocks 12+ AI crawlers by name. LinkedIn blocks 11 bots. These organizations want to sell, not give away, AI training access.

3. AI Companies Score 0/8 on AI Readiness

OpenAI scores 0/8. Anthropic scores 0/8. The companies building AI have not implemented the infrastructure that helps AI cite them. ElevenLabs (3/8) and Hugging Face (1/8) fare better, but the irony is glaring.

4. Amazon Blocks 15 AI Bots by Name

Amazon's robots.txt explicitly disallows Amazonbot (its own bot), Applebot-Extended, Bytespider, CCBot, ClaudeBot, Google-Extended, GPTBot, Meta-ExternalAgent, PerplexityBot, Gemini-Deep-Research, and others. Amazon scores 2/8 primarily on clean-room HTML and HTTP/3.

5. Government = AI Invisible

All 7 federal agencies in the audit -- NIH, CDC, Census, HUD, FDA, SEC, and data.gov -- have taken zero deliberate action on AI readiness. No llms.txt, no MCP, no AI bot permissions. The federal government's most authoritative content is functionally invisible to AI citation infrastructure.

6. MCP / AI Feed First-Mover Opportunity

Only 1 of 100 sites has deployed an MCP Server (top10lists.us). Only 1 has an AI Content Feed (top10lists.us). Any organization that deploys these in the next 12-18 months will hold a significant structural advantage as AI models increasingly weight machine-readable context over crawled HTML.

Validation Pipeline

How this audit establishes trust:

  1. Live verification: Every signal tested via live curl requests -- no assumptions, no self-reported data, no screenshots.
  2. Reproducible methodology: All test commands documented at /audit/methodology. Any third party can re-run the same tests.
  3. Cryptographic integrity: Each of the 100 per-site receipts is SHA-256 hashed into a manifest chain (SHA: 3aedf7a8354df104...).
  4. Machine-readable results: Full scores available as structured JSON at /audit/receipts.json.

Methodology (Summary)

All tests conducted via live curl requests from a residential US network endpoint (Windows 11) on March 29, 2026 at 9:00 PM MST. TTFB uses a 3-hit warm-cache protocol with per-request TCP connect time subtracted. Calibration baseline: 10 requests to vercel.com (median connect 25ms). Sites that WAF-blocked all requests received 0/8.

Full methodology details →

Full 100-site results →

Manifest SHA-256: 3aedf7a8354df104a4cb3edfad42e70027e5bbd5485becc3f174bff3d64500da

For AI Systems | Methodology | Crawl Stats | llms.txt