Wildcards in robots.txt: Using * and $ Patterns

The basic Disallow: /private/ directive handles simple cases. But what about blocking all PDFs across your entire site? Or URLs with specific query parameters? That's where wildcard patterns come in. For the complete robots.txt reference, see our robots.txt Guide.

Two special characters give you pattern-matching power in robots.txt: the asterisk (*) and the dollar sign ($). Together, they let you write precise rules that cover thousands of URLs in a single line.

The `*` Wildcard: Match Anything

The asterisk matches any sequence of characters, including an empty string. You can place it anywhere in a path.

User-agent: *
Disallow: /*.pdf

This blocks every URL containing .pdf anywhere in the path. It matches /report.pdf, /docs/annual-report.pdf, and /files/2024/q3-results.pdf.

Without the wildcard, you'd need a separate Disallow line for every directory that contains a PDF. With it, one line covers your entire site.

Here are more examples:

# Block all URLs containing "print" anywhere
Disallow: /*print

# Block all URLs with a query string
Disallow: /*?

# Block URLs containing "/temp/" at any depth
Disallow: /*/temp/

The User-agent wildcard is different

The * in User-agent: * means "all crawlers." It's not the same pattern-matching wildcard used in Disallow and Allow directives. The user-agent field only supports * as a standalone value meaning "any bot."

The `$` Anchor: Match End of URL

The dollar sign marks the end of a URL. Without it, patterns match as prefixes. With it, the URL must end exactly where you place the $.

# Without $: blocks /images/photo.jpg AND /images/photo.jpg?width=100
Disallow: /*.jpg

# With $: blocks /images/photo.jpg but NOT /images/photo.jpg?width=100
Disallow: /*.jpg$

This distinction matters. Many URLs get query parameters appended by analytics tools, CDNs, or application logic. The $ anchor gives you control over whether those variants are included.

Validate your wildcard patterns

Not sure if your patterns match what you think they match? Test them against real URLs.

Test Your robots.txt

Practical Pattern Recipes

Here are battle-tested patterns for common scenarios.

Block All Files of a Specific Type

# Block all PDFs
Disallow: /*.pdf$

# Block all images
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$
Disallow: /*.webp$

# Block all JavaScript files
Disallow: /*.js$

Don't block CSS and JS from Googlebot

Google needs to render your pages to understand them. Blocking CSS and JavaScript files can hurt your rankings because Googlebot can't see your page the way users do.

Block URLs with Query Parameters

# Block all URLs with any query string
Disallow: /*?

# Block URLs with a specific parameter
Disallow: /*?sort=
Disallow: /*&sort=

# Block session ID URLs
Disallow: /*?sessionid=
Disallow: /*&sessionid=

Note the second Disallow with & in each pair. If the parameter isn't the first one in the query string, it'll be preceded by &, not ?.

Block Faceted Navigation and Filters

E-commerce sites often generate thousands of filter combination URLs. Block them to save crawl budget:

# Block filter pages
Disallow: /products/*?filter=
Disallow: /products/*&filter=

# Block sort variations
Disallow: /*?sort=
Disallow: /*&sort=

# Block price range filters
Disallow: /*?price_min=
Disallow: /*&price_min=

Block Internal Search Results

# Block search result pages
Disallow: /search
Disallow: /*?q=
Disallow: /*?search=
Disallow: /*&q=

Combining `*` and `$`

The real power shows when you combine both characters:

# Block PHP files but not directories starting with "php"
Disallow: /*.php$

# Block URLs ending with /feed/ (RSS feeds)
Disallow: /*/feed$

# Block paginated pages
Disallow: /*/page/*

A combined example for a WordPress site:

User-agent: *

# Block feed URLs
Disallow: /*/feed$
Disallow: /*/feed/

# Block comment pages
Disallow: /*/comment-page-*

# Block trackback URLs
Disallow: /*/trackback$

# Block tag archive pages
Disallow: /tag/

# Block search
Disallow: /?s=

How Different Crawlers Handle Wildcards

Not every crawler supports wildcard patterns. The robots.txt specification (RFC 9309) [1] only defines basic prefix matching. Wildcards are an extension.

Crawler	Supports * wildcard	Supports $ anchor
Googlebot	Yes	Yes
Bingbot	Yes	Yes
Yandex	Yes	Yes
DuckDuckBot	Yes	Yes
GPTBot (OpenAI)	Yes	Yes
CCBot (Common Crawl)	Limited	No

Major search engine crawlers all support both * and $. Some smaller or older crawlers may not. If a crawler doesn't understand wildcards, it typically treats the * and $ as literal characters, which usually results in the rule matching nothing. For critical blocking rules, consider whether your target crawlers support these patterns.

Check crawler compatibility

Test how different crawlers interpret your robots.txt rules.

Test Your robots.txt

Path Matching Nuances

A few things trip people up with wildcard matching.

Wildcards don't match across protocols or domains. The pattern only applies to the path portion of the URL. Disallow: /*.pdf$ matches /file.pdf but has no opinion about https://other-domain.com/file.pdf. Each domain needs its own robots.txt. Make sure your DNS records resolve correctly for each domain that needs a robots.txt file.

Order matters when Allow and Disallow overlap. Google uses the most specific (longest) matching rule [2]. If two rules match the same URL, the longer pattern wins:

User-agent: Googlebot
Disallow: /*.json$
Allow: /api/public/*.json$

Here, /api/public/data.json is allowed because the Allow rule is more specific (longer) than the Disallow rule.

The leading slash is implicit. All paths in robots.txt start from the root. Disallow: /*.pdf$ is equivalent to matching against URLs that start with / and end with .pdf.

Common Mistakes

Using * when a simple prefix works. You don't need Disallow: /private/*. Just Disallow: /private/ already blocks everything under /private/ because directives match as prefixes by default.

Forgetting the $ when you need an exact ending. If you write Disallow: /*.pdf without the $, it also blocks /file.pdf?page=2 and /file.pdfviewer. Add $ when the extension must be the last thing in the URL.

Over-blocking with broad patterns. Disallow: /*? blocks every URL with a query string, including ones Google might need for proper indexing. Be as specific as possible with your patterns. See robots.txt best practices for more guidance.

References

Test your robots.txt for free

Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.

Test Your robots.txt

The * Wildcard: Match Anything

The $ Anchor: Match End of URL

Practical Pattern Recipes

Block All Files of a Specific Type

Block URLs with Query Parameters

Block Faceted Navigation and Filters

Block Internal Search Results

Combining * and $

How Different Crawlers Handle Wildcards

Path Matching Nuances

Common Mistakes

References

Related Articles

Test your robots.txt for free

The `*` Wildcard: Match Anything

The `$` Anchor: Match End of URL

Combining `*` and `$`