Skip to main content

robots.txt Valid.

SEO SEO Fundamentals

Invalid robots.txt can accidentally block search engines from your entire site

What does this check test?

This check verifies that the site's `/robots.txt` file is syntactically valid, accessible (returns HTTP 200), and does not unintentionally block important pages or resources from search engine crawlers. It checks for valid `User-agent`, `Allow`, `Disallow`, and `Sitemap` directives. A missing robots.txt is acceptable (all pages are crawlable by default), but a malformed one can cause search engines to ignore all directives or, worse, block everything as a precaution.

Why does it matter?

The robots.txt file is the first thing search engine crawlers request when visiting your domain. A single typo — like `Disallow: /` instead of `Disallow: /admin/` — can block your entire site from being indexed. Google processes roughly 5 trillion pages and relies on robots.txt to know which URLs to skip. An invalid or misconfigured robots.txt can cause catastrophic SEO damage that is difficult to diagnose because the site appears to work normally for users. Conversely, a well-configured robots.txt improves crawl efficiency by directing bots away from low-value pages (admin panels, search result pages, duplicate parameter URLs).

Who is affected?

DevOps engineers and back-end developers managing server configuration, SEO specialists defining crawl policies, front-end developers deploying sites to new domains, and anyone performing domain migrations or staging-to-production deployments (a common mistake is deploying a staging robots.txt that blocks all crawling).

Where does this apply?

The `/robots.txt` file at the root of your domain (e.g., `https://example.com/robots.txt`). This file affects the entire domain. Subdomains have their own robots.txt files. Check staging/preview environments to ensure they block crawling (to prevent indexing of staging content) while production allows it.

How to fix it

Create a valid robots.txt at your site root. Never use `Disallow: /` in production — it blocks everything. Always include a `Sitemap` directive pointing to your XML sitemap. Basic robots.txt:
txt
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
Next.js App Router robots.ts:
ts
// app/robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: '*',
      allow: '/',
      disallow: ['/api/', '/admin/'],
    },
    sitemap: 'https://example.com/sitemap.xml',
  };
}
Test your robots.txt using Google Search Console's robots.txt tester or the `robots-txt-guard` npm package. After deploying changes, request re-crawling in Google Search Console.

References

AppVet checks robots.txt Valid automatically

Run a free seo scan and get a full report with actionable fixes, including a Fix with AI prompt you can paste into any coding tool.

Run Audit