# BodyHackGuide.co — Robots.txt # Allow all search engines and AI crawlers # # 2026-05-17 GSC FIX — Removed /go/, /embed/, /coa-reader from Disallow. # Reason: Google Search Console reported "Indexed, though blocked by robots.txt" # for these URLs. They have in # the rendered HTML (see go.html line 6, GoRedirect.tsx Helmet, all Embed*.tsx # noIndex={true}, CoaReader canonical → /tools/coa). Blocking them in robots.txt # prevents Google from FETCHING the page, so it never sees the noindex tag — # instead it indexes the URL with no content and warns the owner. Standard # Google-documented fix: ALLOW crawling, the meta tag does the noindex work. # # Still blocked: /admin, /settings, /editor, /reset-password — auth-gated, # zero external linkage expected, defensive. User-agent: Googlebot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Bingbot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password # AI Crawlers — explicitly allowed. Same disallow block as Googlebot/Bingbot. User-agent: GPTBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: ChatGPT-User Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Google-Extended Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: ClaudeBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: anthropic-ai Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: PerplexityBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Bytespider Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: cohere-ai Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: CCBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Claude-Web Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: OAI-SearchBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Perplexity-User Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Applebot-Extended Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Amazonbot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Meta-ExternalAgent Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Meta-ExternalFetcher Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: DuckAssistBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: YouBot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: AI2Bot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: MistralAI-User Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: Diffbot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password # Social crawlers User-agent: Twitterbot Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password User-agent: facebookexternalhit Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password # Default User-agent: * Allow: / Disallow: /admin Disallow: /settings Disallow: /editor Disallow: /reset-password # Phase BB (2026-04-28): sitemap index — primary entry point for crawlers. # Splits the 3,479-URL universe into 13 themed sitemaps so per-category # coverage stats land cleanly in GSC. Google reads the index, then fetches # each per-category sitemap based on changefreq + lastmod. Sitemap: https://www.bodyhackguide.co/sitemap-index.xml # Legacy single-file sitemap kept for back-compat (existing GSC submissions # and external bot.txt parsers that don't follow sitemap-index references). Sitemap: https://www.bodyhackguide.co/sitemap.xml # Dynamic sitemap (DB-driven, always current) served via Supabase edge function Sitemap: https://scrrcgnxagnqklsbmmwy.supabase.co/functions/v1/sitemap # AI crawler guide — curated content index for LLMs # Markdown: https://www.bodyhackguide.co/llms.txt # Full-text: https://www.bodyhackguide.co/llms-full.txt # Structured JSON manifest: https://www.bodyhackguide.co/llms.json # AI crawler & licensing policy (human-readable): https://www.bodyhackguide.co/ai.txt