All tests conducted via live curl requests from a residential US network endpoint (Windows 11) on March 29, 2026 at 9:00 PM MST. No assumptions were made -- every signal required a live HTTP response meeting specific criteria.
| Signal | Test Method | Pass Criteria |
|---|---|---|
| MCP Server | curl -sI https://site/.well-known/mcp.json |
HTTP 200 AND content-type: application/json |
| llms.txt | curl -s https://site/llms.txt (GET, follow redirects) |
HTTP 200 AND body starts with # or ## (markdown), NOT < (HTML) |
| Clean-Room HTML | Compare response size/content: default UA vs GPTBot UA | Same content-length AND no empty SPA shell -- OR clearly SSR for both UAs |
| AI Content Feed | Try /.well-known/ai-content-index.json, /for-ai, /for-ai.txt |
HTTP 200 AND content-type: application/json or text/plain |
| JSON-LD | curl -sL https://site/ then grep for application/ld+json |
1+ matches on homepage |
| Sub-100ms TTFB | 3-hit warm-cache protocol (see below) | Compensated server-side TTFB under 200ms (equivalent to sub-100ms from AI datacenter) |
| 10+ AI Bots Allowed | Fetch /robots.txt; count bots under Allow: with no Disallow: / |
10+ distinct AI bots explicitly allowed (Disallow-only does not count) |
| HTTP/3 | curl -sI https://site/ check alt-svc header |
Header present and contains h3 |
TTFB measurements use a 3-hit warm-cache protocol. Each endpoint is requested 3 times and the best response is used. This simulates AI datacenter cache behavior rather than penalizing cold-start latency.
For each request, the raw TTFB is measured using curl -w "%{time_starttransfer}". Per-request TCP connect time (%{time_connect}) is subtracted from raw TTFB to isolate server-side processing time.
The resulting compensated TTFB represents the server-side response latency that an AI datacenter crawler (with near-zero network hop) would observe.
Before the audit, a calibration phase established the network overhead baseline:
Each site's homepage was fetched twice:
curl/8.x (standard library UA)Mozilla/5.0 AppleWebKit/537.36 (compatible; GPTBot/1.0; +https://openai.com/gptbot)Sites pass clean-room HTML if:
Sites that serve materially different content to AI crawlers (e.g., RentCafe: 5.5KB to humans vs 299KB to GPTBot) are flagged as active cloaking.
The "10+ AI Bots Allowed" signal counts named AI crawlers in robots.txt that have explicit Allow: directives (or are listed under User-agent: blocks without a blanket Disallow: /).
Crawlers listed only under Disallow: do not count. The signal measures deliberate AI openness, not mere mention of AI bots. Most sites (76+) neither name nor address AI crawlers at all in their robots.txt.
Manifest SHA-256: 3aedf7a8354df104a4cb3edfad42e70027e5bbd5485becc3f174bff3d64500da