Wildcards in robots.txt: Using * and $ Patterns
How to use wildcard patterns in robots.txt. The * and $ characters, path matching, and practical examples for complex blocking rules.
The basic Disallow: /private/ directive handles simple cases. But what about blocking all PDFs across your entire site? Or URLs with specific query parameters? That's where wildcard patterns come in.
Two special characters give you pattern-matching power in robots.txt: the asterisk (*) and the dollar sign ($). Together, they let you write precise rules that cover thousands of URLs in a single line.
The * Wildcard: Match Anything
The asterisk matches any sequence of characters, including an empty string. You can place it anywhere in a path.
User-agent: *
Disallow: /*.pdf
This blocks every URL containing .pdf anywhere in the path. It matches /report.pdf, /docs/annual-report.pdf, and /files/2024/q3-results.pdf.
Without the wildcard, you'd need a separate Disallow line for every directory that contains a PDF. With it, one line covers your entire site.
Here are more examples:
# Block all URLs containing "print" anywhere
Disallow: /*print
# Block all URLs with a query string
Disallow: /*?
# Block URLs containing "/temp/" at any depth
Disallow: /*/temp/
The User-agent wildcard is different
The * in User-agent: * means "all crawlers." It's not the same pattern-matching wildcard used in Disallow and Allow directives. The user-agent field only supports * as a standalone value meaning "any bot."
The $ Anchor: Match End of URL
The dollar sign marks the end of a URL. Without it, patterns match as prefixes. With it, the URL must end exactly where you place the $.
# Without $: blocks /images/photo.jpg AND /images/photo.jpg?width=100
Disallow: /*.jpg
# With $: blocks /images/photo.jpg but NOT /images/photo.jpg?width=100
Disallow: /*.jpg$
This distinction matters. Many URLs get query parameters appended by analytics tools, CDNs, or application logic. The $ anchor gives you control over whether those variants are included.
Validate your wildcard patterns
Not sure if your patterns match what you think they match? Test them against real URLs.
Practical Pattern Recipes
Here are battle-tested patterns for common scenarios.
Block All Files of a Specific Type
# Block all PDFs
Disallow: /*.pdf$
# Block all images
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$
Disallow: /*.webp$
# Block all JavaScript files
Disallow: /*.js$
Don't block CSS and JS from Googlebot
Google needs to render your pages to understand them. Blocking CSS and JavaScript files can hurt your rankings because Googlebot can't see your page the way users do.
Block URLs with Query Parameters
# Block all URLs with any query string
Disallow: /*?
# Block URLs with a specific parameter
Disallow: /*?sort=
Disallow: /*&sort=
# Block session ID URLs
Disallow: /*?sessionid=
Disallow: /*&sessionid=
Note the second Disallow with & in each pair. If the parameter isn't the first one in the query string, it'll be preceded by &, not ?.
Block Faceted Navigation and Filters
E-commerce sites often generate thousands of filter combination URLs. Block them to save crawl budget:
# Block filter pages
Disallow: /products/*?filter=
Disallow: /products/*&filter=
# Block sort variations
Disallow: /*?sort=
Disallow: /*&sort=
# Block price range filters
Disallow: /*?price_min=
Disallow: /*&price_min=
Block Internal Search Results
# Block search result pages
Disallow: /search
Disallow: /*?q=
Disallow: /*?search=
Disallow: /*&q=
Combining * and $
The real power shows when you combine both characters:
# Block PHP files but not directories starting with "php"
Disallow: /*.php$
# Block URLs ending with /feed/ (RSS feeds)
Disallow: /*/feed$
# Block paginated pages
Disallow: /*/page/*
A combined example for a WordPress site:
User-agent: *
# Block feed URLs
Disallow: /*/feed$
Disallow: /*/feed/
# Block comment pages
Disallow: /*/comment-page-*
# Block trackback URLs
Disallow: /*/trackback$
# Block tag archive pages
Disallow: /tag/
# Block search
Disallow: /?s=
How Different Crawlers Handle Wildcards
Not every crawler supports wildcard patterns. The robots.txt specification (RFC 9309) only defines basic prefix matching. Wildcards are an extension.
| Crawler | Supports * wildcard | Supports $ anchor |
|---|---|---|
| Googlebot | Yes | Yes |
| Bingbot | Yes | Yes |
| Yandex | Yes | Yes |
| DuckDuckBot | Yes | Yes |
| GPTBot (OpenAI) | Yes | Yes |
| CCBot (Common Crawl) | Limited | No |
Major search engine crawlers all support both * and $. Some smaller or older crawlers may not. If a crawler doesn't understand wildcards, it typically treats the * and $ as literal characters, which usually results in the rule matching nothing. For critical blocking rules, consider whether your target crawlers support these patterns.
Check crawler compatibility
Test how different crawlers interpret your robots.txt rules.
Path Matching Nuances
A few things trip people up with wildcard matching.
Wildcards don't match across protocols or domains. The pattern only applies to the path portion of the URL. Disallow: /*.pdf$ matches /file.pdf but has no opinion about https://other-domain.com/file.pdf. Each domain needs its own robots.txt.
Order matters when Allow and Disallow overlap. Google uses the most specific (longest) matching rule. If two rules match the same URL, the longer pattern wins:
User-agent: Googlebot
Disallow: /*.json$
Allow: /api/public/*.json$
Here, /api/public/data.json is allowed because the Allow rule is more specific (longer) than the Disallow rule.
The leading slash is implicit. All paths in robots.txt start from the root. Disallow: /*.pdf$ is equivalent to matching against URLs that start with / and end with .pdf.
Common Mistakes
Using * when a simple prefix works. You don't need Disallow: /private/*. Just Disallow: /private/ already blocks everything under /private/ because directives match as prefixes by default.
Forgetting the $ when you need an exact ending. If you write Disallow: /*.pdf without the $, it also blocks /file.pdf?page=2 and /file.pdfviewer. Add $ when the extension must be the last thing in the URL.
Over-blocking with broad patterns. Disallow: /*? blocks every URL with a query string, including ones Google might need for proper indexing. Be as specific as possible with your patterns.
Related Articles
Wildcards are the precision tool in your robots.txt toolkit -- use them carefully, test them thoroughly.
Test your robots.txt for free
Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.