robots.txt for Webflow Sites
How to configure robots.txt on Webflow. Covers the auto-generated file, custom editing, common configurations, and Webflow-specific crawling considerations.
Webflow gives you direct control over your robots.txt file, which puts it ahead of many hosted platforms. You can edit the file through the Webflow project settings, adding custom rules for specific bots, blocking directories, or fine-tuning how crawlers interact with your site.
This guide covers how to access and edit your Webflow robots.txt, common configurations, and Webflow-specific issues to watch for. For a general introduction to robots.txt, see our robots.txt guide.
Webflow's Default robots.txt
Webflow generates a default robots.txt for every published site. You can view it at https://yourdomain.com/robots.txt.
The default file is simple:
User-agent: *
Disallow:
Sitemap: https://yourdomain.com/sitemap.xml
This allows all crawlers to access all pages and references your sitemap. It is a permissive default that works for most sites.
Staging subdomain
Before connecting a custom domain, your Webflow site lives at yoursitename.webflow.io. The staging subdomain has a different robots.txt:
User-agent: *
Disallow: /
This blocks all crawlers from the staging URL, which prevents the staging version from being indexed. When you connect a custom domain and publish, Webflow switches to the permissive robots.txt.
If your site is live on a custom domain but the .webflow.io version is still accessible, the staging URL should still have the blocking robots.txt. Verify this by checking yoursitename.webflow.io/robots.txt.
How to Edit robots.txt on Webflow
Webflow provides a text editor for robots.txt in your project settings.
Steps
- Open your Webflow project in the Designer
- Click the Webflow logo (top left) to open Project Settings
- Navigate to the "SEO" tab
- Scroll down to the "robots.txt" section
- Edit the content in the text field
- Save and publish your site
Changes take effect after you publish. The robots.txt is served from Webflow's CDN, so propagation is usually immediate.
What you can add
You have full control over the robots.txt content. You can add:
- Custom
User-agentdirectives for specific bots Disallowrules for directories or URL patternsAllowrules to override broader Disallow directivesCrawl-delaydirectives (supported by Bing and Yandex, ignored by Google)- Multiple
Sitemapreferences - Comments (lines starting with
#)
Common Webflow robots.txt Configurations
Default (allow everything)
User-agent: *
Disallow:
Sitemap: https://yourdomain.com/sitemap.xml
Use this if you have no pages to block. Every published page is crawlable.
Block specific directories
User-agent: *
Disallow: /admin/
Disallow: /internal/
Disallow: /staging-pages/
Sitemap: https://yourdomain.com/sitemap.xml
Use this to prevent crawling of pages you do not want indexed. On Webflow, you might block utility pages, thank-you pages, or draft sections that are technically published but not meant for search results.
Block AI crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: *
Disallow:
Sitemap: https://yourdomain.com/sitemap.xml
This blocks major AI crawlers while keeping search engines allowed. Order matters in robots.txt -- specific user-agent rules are checked before the wildcard rule. For the full list of AI crawlers, see our search engine bots list.
Block SEO tool crawlers
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: *
Disallow:
Sitemap: https://yourdomain.com/sitemap.xml
Some site owners block SEO tool crawlers to prevent competitor analysis. This does not affect search rankings.
Webflow's Automatic Sitemap
Webflow generates a sitemap at /sitemap.xml that is referenced in the robots.txt. The sitemap includes all published pages and CMS collection items.
What the sitemap includes
- All static pages (Home, About, Contact, etc.)
- All published CMS collection items (Blog posts, Products, etc.)
- The
lastmoddate for each URL (based on when the page was last published)
What the sitemap does not include
- Draft pages
- Pages with the "Exclude from sitemap" option enabled
- Utility pages (404, password protection)
- URLs from pagination (collection list pagination pages)
Excluding specific pages from the sitemap
In the Webflow Designer, select a page, open Page Settings, and check "Exclude this page from sitemap." This removes the page from sitemap.xml but does not add a noindex tag. If you want to prevent indexing entirely, also add a noindex meta tag in the page's custom code settings.
Webflow-Specific Crawling Considerations
CMS collection pages
Webflow's CMS generates pages from templates. If you have a Blog collection with 100 posts, Webflow creates 100 pages from the blog post template. All of these are included in the sitemap and crawlable by default.
If some collection items should not be indexed (like internal notes or draft-quality content that you have published for review), exclude them from the sitemap and add noindex tags.
Webflow's CDN
Webflow serves all sites through its CDN (powered by Fastly and Amazon CloudFront). This means:
- Robots.txt is served from the CDN, not a traditional origin server
- The file is cached and served quickly to crawlers
- Changes propagate after publishing (CDN cache invalidation)
Form submission pages
Webflow form submissions generate success states on the same page (the form is replaced with a success message). These are not separate URLs, so there is nothing to block in robots.txt. However, if you redirect to a separate thank-you page after form submission, consider blocking that page:
Disallow: /thank-you
Thank-you pages typically have thin content and no SEO value.
Password-protected pages
Pages behind Webflow's password protection are served with a login gate. Crawlers cannot get past the gate, so these pages are effectively blocked regardless of robots.txt. However, adding them to your robots.txt Disallow list is still a good practice for clarity.
Utility pages
Webflow has built-in utility pages for 404 errors and password protection. These are not included in the sitemap and are handled correctly by default.
Client billing and staging
If you are on a Webflow workspace plan and your site has both a staging URL (yoursitename.webflow.io) and a production URL, make sure you are editing robots.txt for the correct domain. The staging URL should block crawlers (and does so by default).
Test after every change
After editing your robots.txt in Webflow, publish the site and then verify the changes by visiting yourdomain.com/robots.txt in your browser. Use a robots.txt testing tool to confirm your rules work as intended before relying on them. See our robots.txt testing guide.
Noindex vs. Disallow on Webflow
Webflow supports both approaches for keeping pages out of search results.
robots.txt Disallow prevents crawling. The page is not fetched by the crawler at all. But if other sites link to the page, Google may still show the URL in search results (without a snippet).
noindex meta tag allows crawling but prevents indexing. Google fetches the page, sees the noindex tag, and does not include it in search results.
For most purposes on Webflow, noindex is the better choice for hiding specific pages. Use robots.txt Disallow for directories that should never be crawled (admin areas, API endpoints, staging content).
To add a noindex tag to a Webflow page:
- Select the page in the Pages panel
- Open Page Settings
- In the "Custom Code" section, add to the
<head>:
<meta name="robots" content="noindex, nofollow">
For the complete comparison, see robots.txt vs. meta robots.
Common Mistakes
Leaving Disallow: / after connecting a custom domain
If you manually edited your robots.txt to block crawlers during development and forgot to remove the block after launch, your entire site is invisible to search engines. Always check robots.txt after going live.
Blocking CSS and JavaScript
Webflow relies on JavaScript and CSS for layout and interactive elements. If you accidentally block paths that serve these resources, Googlebot cannot render your pages correctly. On Webflow, this is unlikely since the platform manages resource paths, but be careful with broad wildcard rules.
Not publishing after changes
Robots.txt edits in Webflow only take effect after you publish the site. If you edit the file but do not publish, the old version remains live.
Forgetting the Sitemap line
If you replace the entire robots.txt content with custom rules, make sure to include the Sitemap directive:
Sitemap: https://yourdomain.com/sitemap.xml
Without it, crawlers lose the direct reference to your sitemap (though they can still find it via Search Console submissions).
Summary
Webflow provides editable robots.txt through its project settings, giving you full control over crawler access. The defaults are sensible (allow everything, reference the sitemap). Customize as needed to block AI crawlers, utility pages, or specific directories. Always publish after editing, verify changes by checking the live robots.txt file, and use noindex meta tags when you want to prevent indexing of specific pages rather than preventing crawling entirely.
Test your Webflow robots.txt
Verify that your robots.txt rules are working correctly and search engines can reach your content.
Test Your robots.txt