June 17, 2026
4.4.6 – Crawl by the Rules
Bug fixes
- Fixed: The robots.txt checker could report a URL as ALLOWED while Google Search Console showed it blocked by robots.txt. The parser now follows how Google actually reads robots.txt (RFC 9309): consecutive
User-agentlines grouped with only ignored directives between them (likeCrawl-delay) stay in one group, so a trailingDisallowapplies to the right crawler — including*. Thanks to Kevin Vandeplassche for flagging this on LinkedIn. - Fixed: Rules that target query strings (e.g.
Disallow: /*?,/*sid=,/*.php?) were never matched because we only checked the path. Matching now includes the query string, same as Google. - Fixed: Wildcard patterns (
*and$) and longest-match precedence were not applied correctly. ConflictingAllowandDisallowrules now resolve the way Google documents: longest match wins; on a tie,AllowbeatsDisallow. - Fixed: Googlebot-specific groups were ignored when a generic
*group existed. A dedicatedGooglebotblock can now override*, and an empty Googlebot group (crawl-delay only) correctly means crawlable even when*disallows everything. - Fixed: A missing robots.txt (HTTP 404) was shown as “NOT FOUND” instead of treated as no restrictions. 404 and other 4xx responses (except 429) now correctly mean crawling is allowed; 429 and 5xx show an unavailable warning, matching Googlebot behavior.
- Fixed: Switching tabs or URLs quickly could leave the robots.txt status showing the previous page. A race guard ensures only the latest fetch updates the UI.
- Fixed: Network and unexpected errors were lumped together as “NOT FOUND”. CORS/DNS failures are now distinguished from server errors, and the copy-robots button no longer serves stale content from another page after an error.
Improvements
- Clearer status details — Blocked/allowed subtitles now show the HTTP status, which user-agent group matched (Googlebot, with fallback to
*), and the specific rule that decided the verdict. - Rules preview — The matched group’s
AllowandDisallowlines are both shown, so an allowed path no longer looks blocked in the preview. Values are safely escaped in the UI. - Sitemaps — Relative or malformed
Sitemap:entries are no longer silently dropped. They appear with an orange warning and an RFC 9309 reference; valid absolute URLs still link through as before. - AI bots section — Uses the same parsed robots.txt as the main checker (one parse, consistent results across the Page Info tab).
- Translations — New and corrected robots.txt status strings across all seven locales, including previously missing “not found” / “not accessible” labels and proper BLOCKED/ALLOWED wording in Spanish and Dutch.
