robots.txt for Drupal Sites
How to configure robots.txt for Drupal sites. Covers the default file, customization methods, Drupal-specific paths to block, and common issues with modules and updates.
Drupal ships with a default robots.txt file that covers the basics. It blocks admin paths, core directories, and common non-public URLs. But Drupal's modular architecture means your site probably has paths and patterns that the default file does not account for. Understanding how to customize robots.txt for Drupal helps you control what search engines crawl without accidentally blocking important content.
For a general introduction to robots.txt, see our robots.txt guide.
Drupal's Default robots.txt
Drupal includes a robots.txt file in the root of the installation. In Drupal 10 and 11, it looks something like this (abbreviated):
User-agent: *
# CSS, JS, and image files
Allow: /core/*.css$
Allow: /core/*.css?
Allow: /core/*.js$
Allow: /core/*.js?
Allow: /core/*.gif
Allow: /core/*.jpg
Allow: /core/*.jpeg
Allow: /core/*.png
Allow: /core/*.svg
Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.svg
# Directories
Disallow: /core/
Disallow: /profiles/
Disallow: /README.md
Disallow: /web.config
# Files
Disallow: /INSTALL.txt
Disallow: /CHANGELOG.txt
Disallow: /LICENSE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/logout
Disallow: /media/oembed
Sitemap: https://example.com/sitemap.xml
What the defaults do
The Allow rules for CSS, JS, and images inside /core/ and /profiles/ ensure that Googlebot can load these resources for rendering, even though the parent directories are blocked. This is important because Googlebot needs CSS and JavaScript to render pages correctly.
The Disallow rules block:
- Core directories (
/core/,/profiles/) -- Drupal's internal code - Admin paths (
/admin/) -- The administration interface - User paths (
/user/register,/user/login,/user/password) -- Authentication pages - Comment and search -- Dynamic pages that should not be indexed
- Documentation files -- README, INSTALL, CHANGELOG, LICENSE
These defaults are a solid starting point. But they do not cover paths created by contributed modules, custom content types, or site-specific URL patterns.
How to Edit robots.txt in Drupal
Method 1: Edit the file directly
The simplest approach is to edit the robots.txt file in your Drupal root directory with a text editor.
On a standard Drupal installation:
# Navigate to your Drupal root
cd /var/www/html
# Edit robots.txt
nano robots.txt
Important caveat: If you manage Drupal with Composer (which most modern Drupal sites do), the robots.txt file in the repository root may be overwritten when you run composer install or update Drupal core. To prevent this, either:
- Add a
post-install-cmdscript in yourcomposer.jsonthat copies your custom robots.txt into place - Use the
.htaccessapproach to serve a custom file - Use the RobotsTxt module (see below)
Method 2: RobotsTxt module
The RobotsTxt module lets you manage robots.txt from the Drupal admin interface.
Install it:
composer require drupal/robotstxt
drush en robotstxt
After installation:
- Go to Configuration > Search and metadata > robots.txt
- Edit the content in the text area
- Save
The module intercepts requests to /robots.txt and serves the content from the database instead of the file system. This means Composer updates will not overwrite your customizations.
The module also handles multisite configurations. If you run multiple Drupal sites from one codebase, each site can have its own robots.txt.
Method 3: Web server configuration
You can serve a custom robots.txt through your web server (Apache or Nginx) without modifying Drupal files.
Apache (.htaccess):
RewriteRule ^robots\.txt$ /sites/default/files/robots.txt [L]
Nginx:
location = /robots.txt {
alias /var/www/html/sites/default/files/robots.txt;
}
This approach is useful if you want to keep the robots.txt outside of Drupal's codebase entirely.
Drupal-Specific Paths to Block
Beyond the defaults, Drupal sites often need to block additional paths.
Taxonomy term pages
If you use taxonomy terms primarily for content organization (not as landing pages), block them:
Disallow: /taxonomy/
Or if you use clean URLs for taxonomies, block the specific vocabularies you do not want indexed:
Disallow: /tags/
Disallow: /categories/
Aggregation pages
If your site has aggregated content pages (views, panels) that duplicate content from individual nodes, consider blocking the aggregate paths if they do not provide unique value.
User profiles
The default blocks login and registration, but user profile pages may still be accessible:
Disallow: /user/
This blocks all user-related paths. Be careful if you have public-facing user profiles that you want indexed.
Internal paths
Drupal exposes various internal paths depending on your module configuration:
Disallow: /batch
Disallow: /system/
Disallow: /sites/default/files/private/
Contributed module paths
Modules can add their own URL paths. Check your site for paths created by:
- Views with page displays
- Webform submission pages
- Entity reference autocomplete endpoints
- JSON:API or REST endpoints
Block any paths that expose data you do not want crawled:
Disallow: /jsonapi/
Disallow: /api/
Handling Drupal Multisite
If you run Drupal multisite (multiple sites from one codebase), the default robots.txt applies to all sites. This is usually not what you want, since each site may have different URL structures and blocking needs.
Solutions:
- RobotsTxt module -- Manages per-site robots.txt from the admin interface
- Web server configuration -- Serve different robots.txt files based on the requested domain
- Symlinks or deploy scripts -- Copy site-specific robots.txt files into place during deployment
robots.txt and Drupal Caching
Drupal's page cache and reverse proxy caching (Varnish, CDN) can affect robots.txt delivery.
Page cache
If Drupal's internal page cache is enabled, the robots.txt response may be cached. Edits to the file might not appear immediately. Clear the Drupal cache after making changes:
drush cr
Reverse proxy / CDN
If you use Varnish or a CDN, the robots.txt may be cached at the proxy level. Purge the cached version after making changes. Most CDN providers let you purge specific URLs.
Verify your changes
After editing robots.txt and clearing caches, verify the live version:
- Visit
https://yourdomain.com/robots.txtin a browser - Check that your changes appear
- Use a robots.txt testing tool to verify your rules work as intended
For testing guidance, see our robots.txt testing guide.
Drupal core updates and robots.txt
When you update Drupal core via Composer, the default robots.txt file may be overwritten with the new version from core. Always check your robots.txt after core updates. Using the RobotsTxt module or a web server-level solution avoids this problem entirely.
Common Mistakes
Not updating the Sitemap line
The default robots.txt includes a placeholder sitemap URL. Update it to match your actual sitemap URL:
Sitemap: https://yourdomain.com/sitemap.xml
If you use the Simple XML Sitemap module, the URL is typically /sitemap.xml. If you use a different module, the URL may differ.
Blocking CSS and JavaScript
The default robots.txt carefully allows CSS and JS from /core/ while blocking the directory itself. If you modify the rules, make sure you do not accidentally block these resources. Googlebot needs them to render your pages. See our guide on how to test robots.txt for verification methods.
Forgetting to handle clean URLs vs. system paths
Drupal content can be accessed at both the system path (/node/123) and the clean URL alias (/about). The default robots.txt does not block node paths. If you want to block direct node access (to prevent duplicate content), add:
Disallow: /node/
Allow: /node/*.css$
Allow: /node/*.js$
But be careful -- this blocks all node paths, including the canonical ones for content without URL aliases.
Over-blocking after module installations
Some documentation recommends adding Disallow rules for every module path. This can lead to over-blocking, where legitimate content pages are accidentally blocked. Only block paths that genuinely should not be crawled.
Not testing after deployment
Deployment pipelines can modify or replace robots.txt. Add a post-deployment check to verify the file contents.
Summary
Drupal ships with a solid default robots.txt that blocks admin, core, and authentication paths while allowing CSS and JavaScript for rendering. Customize it to block additional paths created by contributed modules, taxonomy pages, and API endpoints. Use the RobotsTxt module to manage the file from the admin interface and protect against Composer overwrites. Always verify changes after deployment and core updates.
Test your Drupal robots.txt
Paste your robots.txt and test URLs against it to verify your rules work correctly.
Test Your robots.txt