# Reproduce This: GEO Survey Pilot - 3 Sites - 2026-06-15 This pilot measures AI infrastructure readiness for 3 sites spanning Government, Technology, and News industries. All scoring runs through canonical edge functions (collect-comparison-receipt -> score-from-receipt) on the AnswerShare Supabase project. Receipt collection and scoring are deterministic given the same domain state; re-running against the same domain will produce updated scores reflecting current site conditions. --- ## Pipeline ### Step 1: Collect Receipt ```bash SERVICE_KEY= curl -s -X POST https://szhmfrouktgcbkyyslvs.supabase.co/functions/v1/collect-comparison-receipt \ -H "Authorization: Bearer $SERVICE_KEY" \ -H "apikey: $SERVICE_KEY" \ -H "Content-Type: application/json" \ -d '{"domain": "", "force_refresh": true}' \ --max-time 180 ``` Returns: `{ "receipt_id": "", "generated_at": "", "summary": {...} }` ### Step 2: Score from Receipt ```bash curl -s -X POST https://szhmfrouktgcbkyyslvs.supabase.co/functions/v1/score-from-receipt \ -H "Authorization: Bearer $SERVICE_KEY" \ -H "apikey: $SERVICE_KEY" \ -H "Content-Type: application/json" \ -d '{"receipt_id": "", "owned": false}' \ --max-time 360 ``` Returns: Full score object with `columns.geo`, `columns.seo`, `dashboard.*` --- ## Model Pins (at time of scoring) | Engine | Model | |--------|-------| | Perplexity | sonar-pro | | OpenAI | gpt-4o | | Gemini | gemini-3.1-pro-preview | | Anthropic | claude-sonnet-4-6 | | xAI/Grok | grok-3 | --- ## 13-Signal Thresholds ### 8 Binary Signals | Signal | Pass Condition | |--------|---------------| | robots_ai_bots_allowed | robots.txt does NOT block major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) | | llms_txt_present | /llms.txt returns HTTP 200 with markdown content (starts with #) | | llms_full_txt_present | /llms-full.txt returns HTTP 200 with markdown content | | sitemap_fresh | Median lastmod across sitemap pages < 60 days; null fails | | jsonld_structured_data | 1 or more application/ld+json blocks on homepage | | prerendered_html | Bot-fetched homepage body >= 1500 chars with no-JS rendering | | mcp_server_live | /.well-known/mcp.json returns HTTP 200 + application/json | | ai_content_feed | /ai-content-index.json or equivalent returns HTTP 200 | ### 5 Quantitative Signals | Signal | Pass Condition | |--------|---------------| | SGR >= 0.25 | Source grounding ratio: cited claims / total claims >= 0.25 | | RPC > 3000 | Reported page count from sitemap > 3,000 pages | | LMR <= 30d | Median lastmod recency <= 30 days; null fails | | RR >= 0.45 | Readability ratio (readable chars / total chars) >= 0.45 | | RTC <= 1.0 | Response token cost (tokens * TTLB / useful chars / 4) <= 1.0 | --- ## TTFB / TTLB Methodology - **TTFB**: `rapid_fire_ttfb_p50_ms` from 8 warm probe hits (p50 of hits 3-8 after 2 warmup hits). Threshold: 800ms. Subtract connect time for network compensation. - **TTLB**: `rapid_fire_ttlb_p75_ms` from same probe set. Threshold: 2500ms. - Both use the droplet egress path (143.244.215.194) for consistent measurement. --- ## GEO Bands | Band | Score Range | |------|------------| | Elite | 90-100 | | Strong | 75-89 | | Moderate | 60-74 | | Weak | 40-59 | | Critical | 0-39 | --- ## How to Dispute a Score The scoring pipeline is fully transparent. To dispute: 1. Re-run collect-comparison-receipt with `force_refresh: true` for the domain 2. Re-run score-from-receipt with the new receipt_id 3. Compare the new score against this report 4. Signal disagreements are most commonly caused by: stale sitemap data, SPA shell detection, robots.txt interpretation for AI bot blocking --- ## Known Caveats - **Engine Audit blank**: No seeded money queries were run for this pilot. Engine-audit verdicts (AI citation visibility) require a separate 5-engine proof run. - **AI Crawler Hits "--"**: All 3 domains are non-owned survey sites. Bot-hit telemetry is unavailable. - **High-authority sites may score low GEO**: The 13-signal rubric measures AI infrastructure readiness, not domain authority. nasa.gov scores 6/13 despite enormous authority because it lacks llms.txt, mcp.json, fresh sitemap lastmods, and SGR signals. - **LMR absent on many sites**: Sites with sitemaps that omit `` tags get LMR=null which fails per spec. bbc.com illustrates this. - **bbc.com blocks AI crawlers**: The robots.txt explicitly blocks ClaudeBot, GPTBot, PerplexityBot, Google-Extended, anthropic-ai and many others - this directly fails the robots_ai_bots_allowed signal. - **RTC fails all 3 sites**: The RTC metric (response token cost) is sensitive to large HTML pages with long TTLB. All 3 sites exceed the 1.0 threshold. - **NPS legs not computed**: The muNPS/lambdaNPS/deltaNPS legs are excluded per spec. The edge function calls these legs but they return null on this project (mnps-* functions not deployed on szhmfrouktgcbkyyslvs). NPS values in the score response are informational only and not surfaced in this report.