robots.txt Disallow Directive Explained

What Disallow does

The Disallow directive tells crawlers not to access URLs that match a given path pattern. It's the primary mechanism in robots.txt for controlling what gets crawled [1]. For the complete robots.txt reference, see our robots.txt Guide.

User-agent: *
Disallow: /admin/

That single rule tells every crawler to skip any URL that starts with /admin/. Simple enough. But the details of how path matching works -- and the mistakes people make with it -- are where things get interesting.

Disallow syntax and path matching

A Disallow value is a path prefix. The crawler checks whether the beginning of a URL's path matches the Disallow value. No regex. No fuzzy matching. Just "does this URL path start with this string?"

User-agent: *
Disallow: /private

This blocks:

/private
/private/
/private/page.html
/private-stuff/ (yes, this too -- it starts with /private)

That last one catches people off guard. If you only want to block the /private/ directory, include the trailing slash:

User-agent: *
Disallow: /private/

Now /private-stuff/ is no longer blocked because it doesn't start with /private/.

Trailing slashes matter

Disallow: /private blocks anything starting with /private including /private-data/ and /privately/. Use Disallow: /private/ to target only the directory and its contents.

Wildcard patterns

The original robots.txt spec didn't support wildcards, but Google and Bing both support * (match any sequence of characters) and $ (match end of URL) in their implementations.

User-agent: *
# Block all PDF files
Disallow: /*.pdf$

# Block all URLs containing "sort="
Disallow: /*sort=

# Block all URLs with query parameters
Disallow: /*?

These are powerful but non-standard. Most major crawlers support them, but don't assume every bot will. When in doubt, stick to basic prefix matching.

Pattern	What it blocks
Disallow: /dir/	Everything under /dir/
Disallow: /page	/page, /page.html, /pages/, /pageant
Disallow: /*.pdf$	All URLs ending in .pdf
Disallow: /dir/*.php	All .php files under /dir/
Disallow: /*?sort=	Any URL with sort= in query string

Blocking a single page vs. a directory vs. everything

Here are the three most common blocking scenarios:

Block a single page:

User-agent: *
Disallow: /secret-page.html

Block an entire directory:

User-agent: *
Disallow: /admin/

Block your entire site:

User-agent: *
Disallow: /

That last one is the nuclear option. Disallow: / matches every URL on your site because every URL path starts with /. This is appropriate for staging environments and development servers. It's catastrophic on production if done accidentally.

Accidentally blocking your whole site?

Test your robots.txt rules right now to make sure you're not blocking pages you need indexed.

Test Your robots.txt

Disallow: / vs. Disallow: (empty)

This distinction confuses nearly everyone the first time:

# Blocks EVERYTHING
User-agent: *
Disallow: /

# Blocks NOTHING (same as having no robots.txt)
User-agent: *
Disallow:

An empty Disallow: value means "disallow nothing," which effectively allows everything. It's the explicit way to say "this crawler has no restrictions." You'd use it when you want to set rules for specific crawlers but allow everything for the wildcard:

User-agent: BadBot
Disallow: /

User-agent: *
Disallow:

This blocks BadBot from everything while explicitly allowing all other crawlers full access.

Multiple Disallow rules

You can stack as many Disallow lines as you need within a single User-agent block:

User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /search?
Disallow: /api/
Disallow: /cart/
Disallow: /checkout/

Each rule is independent. A URL is blocked if it matches any of the Disallow patterns, unless a more specific Allow rule overrides it.

User-agent: *
Disallow: /docs/
Allow: /docs/public/

Here, everything under /docs/ is blocked except for /docs/public/ and its contents. When Allow and Disallow rules conflict, the more specific (longer) path wins. If they're the same length, Allow takes precedence.

Common Disallow mistakes

These are the mistakes that show up over and over in real-world robots.txt files.

Blocking CSS, JavaScript, and images

# Don't do this
User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /images/

Years ago, people blocked asset directories to "save crawl budget." Today, this actively hurts you. Google needs to render your pages to understand them. If it can't access your CSS and JavaScript, it can't properly evaluate your content, and your rankings will suffer.

Blocking everything by accident

# A stray Disallow: / will ruin your day
User-agent: *
Disallow: /

This often happens during a staging-to-production migration. Set up robots.txt monitoring to catch these problems early. The staging robots.txt blocks everything (correct), and then someone forgets to update it when the site goes live.

Always check after deployment

After every deployment, verify that your production robots.txt is correct. It takes five seconds to check and can save weeks of lost indexing.

Using Disallow to hide pages from search results

# This does NOT prevent indexing
User-agent: *
Disallow: /secret-page/

If another site links to /secret-page/, Google can still index the URL. It won't crawl the content, but the URL can appear in search results with a "No information is available for this page" snippet. To prevent indexing, use a noindex meta tag on the page itself.

Forgetting the leading slash

# This is invalid -- paths must start with /
User-agent: *
Disallow: admin/

Every Disallow path must begin with /. Without it, the rule may be ignored by crawlers.

Disallow vs. noindex

These solve different problems:

Disallow	noindex
Prevents crawling	Prevents indexing
Set in robots.txt	Set in HTML meta tag or HTTP header
Crawler never fetches the page	Crawler fetches the page, then drops it from index
URL can still appear in search results	URL is removed from search results
Saves crawl budget	Uses crawl budget (crawler must visit the page)

The most important row: if you Disallow a page, the crawler can't see the noindex tag on it because it never fetches the page. So if you need a page removed from search results, do not block it in robots.txt. Let the crawler access the page so it can read the noindex directive.

Test your Disallow rules

Paste your robots.txt and check exactly which URLs are blocked and allowed for each crawler.

Test Your robots.txt

When to use Disallow

Use Disallow for:

Admin and internal areas -- /admin/, /dashboard/, /internal/
Duplicate content -- Faceted navigation, sorted pages, session-based URLs
Temporary or staging content -- /tmp/, /staging/
API endpoints -- /api/, /graphql
Internal search results -- /search?, /results?
User-specific pages -- /cart/, /account/, /checkout/

Don't use Disallow for anything you need completely removed from search results. That's a job for noindex.

References

Test your robots.txt for free

Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.

Test Your robots.txt

What Disallow does

Disallow syntax and path matching

Wildcard patterns

Blocking a single page vs. a directory vs. everything

Disallow: / vs. Disallow: (empty)

Multiple Disallow rules

Common Disallow mistakes

Blocking CSS, JavaScript, and images

Blocking everything by accident

Using Disallow to hide pages from search results

Forgetting the leading slash

Disallow vs. noindex

When to use Disallow

References

Related Articles

Test your robots.txt for free