robots.txt Validator & Tester
Paste a robots.txt and test whether a given user agent can fetch a specific path. Implements the longest-match rule from RFC 9309.
Test a fetch
About robots.txt Validator & Tester
Parse a robots.txt file and test whether a specific user agent can fetch a specific path. The tester implements RFC 9309 longest-match semantics with * wildcard and$ end-anchor support — the same rules Google and Bing use.
What this tool does
- Parse — groups rules by user-agent, extracts sitemaps and crawl-delay.
- Validate — flags unknown directives, rules before any user-agent, and invalid sitemap URLs.
- Test — enter any user-agent and path to see which rule matched and whether the fetch is allowed.
Pipeline
- Sitemap Validator — validate the sitemap URLs referenced in the robots.txt.
- Meta Tag Generator — generate the meta robots tags that complement robots.txt for page-level control.
Frequently asked
- What is robots.txt?
- robots.txt is a plain-text file at the root of a website that tells crawlers which paths they are allowed or disallowed from fetching. It is a convention (the Robots Exclusion Protocol, now RFC 9309), not a security mechanism — any crawler can ignore it. It is primarily used to prevent search engines from indexing staging pages, admin areas, or duplicate content.
- What is the longest-match rule?
- When multiple rules match a URL, the most specific one (longest pattern) wins. If both "Disallow: /private" and "Allow: /private/public" match a URL, the Allow wins because it is longer. This is the Google/Bing interpretation of RFC 9309 and what this tool implements.
- What do the * and $ wildcards do?
- * matches any sequence of characters (including none). $ anchors the pattern to the end of the URL path — "Disallow: /file.pdfquot; blocks /file.pdf but not /file.pdf?version=2. Wildcards are a Google extension to the original spec and are now part of RFC 9309.
- Does robots.txt affect all crawlers equally?
- No. Each User-agent group applies only to the named crawler. A rule under "User-agent: Googlebot" does not affect Bingbot. The wildcard "User-agent: *" applies to any crawler not matched by a more specific group. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot respect robots.txt if you add entries for them.
- What is Crawl-delay?
- Crawl-delay tells a crawler how many seconds to wait between requests. Google ignores it (use Google Search Console instead); Bing and many other crawlers respect it. Setting it too high can slow down legitimate indexing.