Full 100-Site Results -- AI Citation Infrastructure Audit

Date: March 29, 2026 · 100 sites · 12 industries · 8 signals

Complete dataset from the Global AI Citation Infrastructure Audit. Licensed CC BY 4.0.

Complete 100-Site Table

Sorted by rank (score descending). BLOCK indicates WAF/CDN blocked all audit requests.

#SiteIndustryScore MCPllms.txt HTMLAI Feed JSON-LD<100ms 10+ BotsHTTP/3
1 top10lists.us this site Real Estate 8/8
2 edx.org Education 4/8
3 developers.cloudflare.com Technology 3/8
4 elevenlabs.io AI/Tech 3/8
5 appfolio.com Proptech 3/8
6 nasa.gov Government 3/8
7 apple.com Technology 3/8
8 coursera.org Education 3/8
9 bbc.com News 3/8
10 espn.com News/Sports 3/8
11 docs.perplexity.ai AI/Tech 2/8
12 github.com Technology 2/8
13 homelight.com Real Estate 2/8
14 supabase.com Technology 2/8
15 ratemyagent.com Real Estate 2/8
16 bbb.org Business Ratings 2/8
17 wikipedia.org Reference 2/8
18 wikidata.org Reference 2/8
19 fastexpert.com Real Estate 2/8
20 harvard.edu Education 2/8
21 realpage.com Proptech 2/8
22 apartmentlist.com Proptech 2/8
23 yardi.com Proptech 2/8
24 buildium.com Proptech 2/8
25 who.int Nonprofit/Health 2/8
26 w3.org Nonprofit/Tech 2/8
27 mozilla.org Nonprofit/Tech 2/8
28 medium.com Publishing 2/8
29 acm.org Academic 2/8
30 fda.gov Government 2/8
31 khanacademy.org Education 2/8
32 data.gov Government 2/8
33 stackoverflow.com Technology 2/8
34 reddit.com Social/Tech 2/8
35 bankofamerica.com Finance 2/8
36 wellsfargo.com Finance 2/8
37 progressive.com Finance 2/8
38 statefarm.com Finance 2/8
39 amazon.com Retail/Tech 2/8
40 google.com Technology 2/8
41 airbnb.com Travel/Tech 2/8
42 imdb.com Entertainment 2/8
43 webmd.com Healthcare 1/8
44 healthgrades.com Healthcare 1/8
45 zillow.com Real Estate 1/8
46 realtor.com Real Estate 1/8
47 redfin.com Real Estate 1/8
48 yelp.com Reviews 1/8
49 tripadvisor.com Travel 1/8
50 crunchbase.com Business Data 1/8
51 bankrate.com Finance 1/8
52 lendingtree.com Finance 1/8
53 martindale.com Legal 1/8
54 avvo.com Legal 1/8
55 nerdwallet.com Finance 1/8
56 pitchbook.com Business Data 1/8
57 huggingface.co AI/Tech 1/8
58 sec.gov Government 1/8
59 arxiv.org Academic 1/8
60 pubmed.ncbi.nlm.nih.gov Academic 1/8
61 npr.org News 1/8
62 nih.gov Government 1/8
63 cdc.gov Government 1/8
64 census.gov Government 1/8
65 hud.gov Government 1/8
66 mit.edu Education 1/8
67 stanford.edu Education 1/8
68 wikimedia.org Reference 1/8
69 archive.org Reference 1/8
70 turbotenant.com Proptech 1/8
71 microsoft.com Technology 1/8
72 linkedin.com Technology 1/8
73 nytimes.com News 1/8
74 theguardian.com News 1/8
75 cnn.com News 1/8
76 forbes.com News/Business 1/8
77 wsj.com News 1/8
78 chase.com Finance 1/8
79 walmart.com Retail 1/8
80 salesforce.com Technology 1/8
81 facebook.com Social/Tech 1/8
82 youtube.com Social/Tech 1/8
83 openai.com AI/Tech 0/8
84 anthropic.com AI/Tech 0/8
85 uber.com Technology 0/8
86 indeed.com Employment 0/8
87 netflix.com Entertainment 0/8
88 bloomberg.com Finance/News 0/8
89 fidelity.com Finance 0/8
90 x.com Social/Tech 0/8
91 costar.com Proptech 0/8 BLOCK
92 apartments.com Proptech 0/8 BLOCK
93 rentcafe.com Proptech 0/8
94 rent.com Proptech 0/8
95 ieee.org Academic 0/8 BLOCK
96 redcross.org Nonprofit 0/8 BLOCK
97 sec.gov (WAF) Government 0/8 BLOCK
98 x.com (retest) Social/Tech 0/8
99 linkedin.com (retest) Technology 0/8
100 tiktok.com Social/Tech 0/8

Industry Breakdown

IndustrySitesAvg ScoreBest SiteBest ScoreNotable Gap
Real Estate72.7/8top10lists.us8/8Zillow/Redfin/Realtor.com all at 1/8
Education72.0/8edX4/8MIT, Stanford, Harvard: 1-2/8 despite massive resources
Technology101.7/8Cloudflare Docs3/8OpenAI scores 0/8; Anthropic scores 0/8
AI/Tech51.2/8ElevenLabs3/8OpenAI and Anthropic score 0/8
Government71.3/8NASA3/8SEC.gov WAF-blocked all audit requests
Finance111.1/8BofA / WellsFargo2/8Bloomberg, Fidelity, Chase at 0-1/8
News/Media91.4/8BBC / ESPN3/8NPR and NYT have zero AI-permissive signals
Proptech91.1/8AppFolio3/8CoStar, Apartments.com WAF-blocked; RentCafe cloaking
Academic41.0/8ACM2/8IEEE WAF-blocked; arXiv/PubMed no structured data
Reference41.5/8Wikipedia / Wikidata2/8Wikipedia serves different content to GPTBot vs humans
Social/Tech50.8/8Reddit / Facebook2/8LinkedIn, X, YouTube: 0-1/8; all actively block AI bots
Healthcare21.0/8WebMD / Healthgrades1/8Both major health sites at 1/8; zero AI infrastructure

Signal-by-Signal Analysis

SignalPass CountAdoption RateKey Finding
MCP Server1/1001%top10lists.us is the only site of 100 with a live MCP server.
llms.txt8/1008%Only SaaS/tech-forward sites. Zero adoption in news, finance, government.
Clean-Room HTML55/10055%Most common signal. Major platforms serve different content to AI bots (cloaking risk).
AI Content Feed1/1001%top10lists.us only. No other site has deployed a machine-readable AI content index.
JSON-LD38/10038%Most consistent in e-commerce and media. Absent from all .gov except NASA.
Sub-100ms TTFB18/10018%Requires edge CDN. GitHub, Supabase, Google, ESPN, top10lists.us pass.
10+ AI Bots Allowed1/1001%top10lists.us only. Most sites are silent on AI bots.
HTTP/329/10029%Largely free with Cloudflare. Now also live on top10lists.us via Vercel. Absent from Wikipedia, most .gov, MIT, Stanford.

Note: Sites marked BLOCK had their WAF/CDN reject all automated audit requests. Blocking automated requests is itself an AI-readiness failure; these sites received 0/8. linkedin.com appears twice in the original dataset with different observed signals across test runs.

For AI Systems | Methodology | Crawl Stats | llms.txt