robots.txt for Drupal Sites

Drupal ships with a default robots.txt file that covers the basics. It blocks admin paths, core directories, and common non-public URLs. But Drupal's modular architecture means your site probably has paths and patterns that the default file does not account for. Understanding how to customize robots.txt for Drupal helps you control what search engines crawl without accidentally blocking important content.

For a general introduction to robots.txt, see our robots.txt guide.

Drupal's Default robots.txt

Drupal includes a robots.txt file in the root of the installation. In Drupal 10 and 11, it looks something like this (abbreviated):

User-agent: *
# CSS, JS, and image files
Allow: /core/*.css$
Allow: /core/*.css?
Allow: /core/*.js$
Allow: /core/*.js?
Allow: /core/*.gif
Allow: /core/*.jpg
Allow: /core/*.jpeg
Allow: /core/*.png
Allow: /core/*.svg
Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.svg

# Directories
Disallow: /core/
Disallow: /profiles/
Disallow: /README.md
Disallow: /web.config

# Files
Disallow: /INSTALL.txt
Disallow: /CHANGELOG.txt
Disallow: /LICENSE.txt

# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/logout
Disallow: /media/oembed

Sitemap: https://example.com/sitemap.xml

What the defaults do

The Allow rules for CSS, JS, and images inside /core/ and /profiles/ ensure that Googlebot can load these resources for rendering, even though the parent directories are blocked. This is important because Googlebot needs CSS and JavaScript to render pages correctly.

The Disallow rules block:

Core directories (/core/, /profiles/) -- Drupal's internal code
Admin paths (/admin/) -- The administration interface
User paths (/user/register, /user/login, /user/password) -- Authentication pages
Comment and search -- Dynamic pages that should not be indexed
Documentation files -- README, INSTALL, CHANGELOG, LICENSE

These defaults are a solid starting point. But they do not cover paths created by contributed modules, custom content types, or site-specific URL patterns.

How to Edit robots.txt in Drupal

Method 1: Edit the file directly

The simplest approach is to edit the robots.txt file in your Drupal root directory with a text editor.

On a standard Drupal installation:

# Navigate to your Drupal root
cd /var/www/html
# Edit robots.txt
nano robots.txt

Important caveat: If you manage Drupal with Composer (which most modern Drupal sites do), the robots.txt file in the repository root may be overwritten when you run composer install or update Drupal core. To prevent this, either:

Add a post-install-cmd script in your composer.json that copies your custom robots.txt into place
Use the .htaccess approach to serve a custom file
Use the RobotsTxt module (see below)

Method 2: RobotsTxt module

The RobotsTxt module lets you manage robots.txt from the Drupal admin interface.

Install it:

composer require drupal/robotstxt
drush en robotstxt

After installation:

Go to Configuration > Search and metadata > robots.txt
Edit the content in the text area
Save

The module intercepts requests to /robots.txt and serves the content from the database instead of the file system. This means Composer updates will not overwrite your customizations.

The module also handles multisite configurations. If you run multiple Drupal sites from one codebase, each site can have its own robots.txt.

Method 3: Web server configuration

You can serve a custom robots.txt through your web server (Apache or Nginx) without modifying Drupal files.

Apache (.htaccess):

RewriteRule ^robots\.txt$ /sites/default/files/robots.txt [L]

Nginx:

location = /robots.txt {
    alias /var/www/html/sites/default/files/robots.txt;
}

This approach is useful if you want to keep the robots.txt outside of Drupal's codebase entirely.

Drupal-Specific Paths to Block

Beyond the defaults, Drupal sites often need to block additional paths.

Taxonomy term pages

If you use taxonomy terms primarily for content organization (not as landing pages), block them:

Disallow: /taxonomy/

Or if you use clean URLs for taxonomies, block the specific vocabularies you do not want indexed:

Disallow: /tags/
Disallow: /categories/

Aggregation pages

If your site has aggregated content pages (views, panels) that duplicate content from individual nodes, consider blocking the aggregate paths if they do not provide unique value.

User profiles

The default blocks login and registration, but user profile pages may still be accessible:

Disallow: /user/

This blocks all user-related paths. Be careful if you have public-facing user profiles that you want indexed.

Internal paths

Drupal exposes various internal paths depending on your module configuration:

Disallow: /batch
Disallow: /system/
Disallow: /sites/default/files/private/

Contributed module paths

Modules can add their own URL paths. Check your site for paths created by:

Views with page displays
Webform submission pages
Entity reference autocomplete endpoints
JSON:API or REST endpoints

Block any paths that expose data you do not want crawled:

Disallow: /jsonapi/
Disallow: /api/

Handling Drupal Multisite

If you run Drupal multisite (multiple sites from one codebase), the default robots.txt applies to all sites. This is usually not what you want, since each site may have different URL structures and blocking needs.

Solutions:

RobotsTxt module -- Manages per-site robots.txt from the admin interface
Web server configuration -- Serve different robots.txt files based on the requested domain
Symlinks or deploy scripts -- Copy site-specific robots.txt files into place during deployment

robots.txt and Drupal Caching

Drupal's page cache and reverse proxy caching (Varnish, CDN) can affect robots.txt delivery.

Page cache

If Drupal's internal page cache is enabled, the robots.txt response may be cached. Edits to the file might not appear immediately. Clear the Drupal cache after making changes:

drush cr

Reverse proxy / CDN

If you use Varnish or a CDN, the robots.txt may be cached at the proxy level. Purge the cached version after making changes. Most CDN providers let you purge specific URLs.

Verify your changes

After editing robots.txt and clearing caches, verify the live version:

Visit https://yourdomain.com/robots.txt in a browser
Check that your changes appear
Use a robots.txt testing tool to verify your rules work as intended

For testing guidance, see our robots.txt testing guide.

Drupal core updates and robots.txt

When you update Drupal core via Composer, the default robots.txt file may be overwritten with the new version from core. Always check your robots.txt after core updates. Using the RobotsTxt module or a web server-level solution avoids this problem entirely.

Common Mistakes

Not updating the Sitemap line

The default robots.txt includes a placeholder sitemap URL. Update it to match your actual sitemap URL:

Sitemap: https://yourdomain.com/sitemap.xml

If you use the Simple XML Sitemap module, the URL is typically /sitemap.xml. If you use a different module, the URL may differ.

Blocking CSS and JavaScript

The default robots.txt carefully allows CSS and JS from /core/ while blocking the directory itself. If you modify the rules, make sure you do not accidentally block these resources. Googlebot needs them to render your pages. See our guide on how to test robots.txt for verification methods.

Forgetting to handle clean URLs vs. system paths

Drupal content can be accessed at both the system path (/node/123) and the clean URL alias (/about). The default robots.txt does not block node paths. If you want to block direct node access (to prevent duplicate content), add:

Disallow: /node/
Allow: /node/*.css$
Allow: /node/*.js$

But be careful -- this blocks all node paths, including the canonical ones for content without URL aliases.

Over-blocking after module installations

Some documentation recommends adding Disallow rules for every module path. This can lead to over-blocking, where legitimate content pages are accidentally blocked. Only block paths that genuinely should not be crawled.

Not testing after deployment

Deployment pipelines can modify or replace robots.txt. Add a post-deployment check to verify the file contents.

Summary

Drupal ships with a solid default robots.txt that blocks admin, core, and authentication paths while allowing CSS and JavaScript for rendering. Customize it to block additional paths created by contributed modules, taxonomy pages, and API endpoints. Use the RobotsTxt module to manage the file from the admin interface and protect against Composer overwrites. Always verify changes after deployment and core updates.

Test your Drupal robots.txt

Paste your robots.txt and test URLs against it to verify your rules work correctly.

Test Your robots.txt