robots.txt for Webflow Sites

How to configure robots.txt on Webflow. Covers the auto-generated file, custom editing, common configurations, and Webflow-specific crawling considerations.

Webflow gives you direct control over your robots.txt file, which puts it ahead of many hosted platforms. You can edit the file through the Webflow project settings, adding custom rules for specific bots, blocking directories, or fine-tuning how crawlers interact with your site.

This guide covers how to access and edit your Webflow robots.txt, common configurations, and Webflow-specific issues to watch for. For a general introduction to robots.txt, see our robots.txt guide.

Webflow's Default robots.txt

Webflow generates a default robots.txt for every published site. You can view it at https://yourdomain.com/robots.txt.

The default file is simple:

User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

This allows all crawlers to access all pages and references your sitemap. It is a permissive default that works for most sites.

Staging subdomain

Before connecting a custom domain, your Webflow site lives at yoursitename.webflow.io. The staging subdomain has a different robots.txt:

User-agent: *
Disallow: /

This blocks all crawlers from the staging URL, which prevents the staging version from being indexed. When you connect a custom domain and publish, Webflow switches to the permissive robots.txt.

If your site is live on a custom domain but the .webflow.io version is still accessible, the staging URL should still have the blocking robots.txt. Verify this by checking yoursitename.webflow.io/robots.txt.

How to Edit robots.txt on Webflow

Webflow provides a text editor for robots.txt in your project settings.

Steps

  1. Open your Webflow project in the Designer
  2. Click the Webflow logo (top left) to open Project Settings
  3. Navigate to the "SEO" tab
  4. Scroll down to the "robots.txt" section
  5. Edit the content in the text field
  6. Save and publish your site

Changes take effect after you publish. The robots.txt is served from Webflow's CDN, so propagation is usually immediate.

What you can add

You have full control over the robots.txt content. You can add:

  • Custom User-agent directives for specific bots
  • Disallow rules for directories or URL patterns
  • Allow rules to override broader Disallow directives
  • Crawl-delay directives (supported by Bing and Yandex, ignored by Google)
  • Multiple Sitemap references
  • Comments (lines starting with #)

Common Webflow robots.txt Configurations

Default (allow everything)

User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Use this if you have no pages to block. Every published page is crawlable.

Block specific directories

User-agent: *
Disallow: /admin/
Disallow: /internal/
Disallow: /staging-pages/

Sitemap: https://yourdomain.com/sitemap.xml

Use this to prevent crawling of pages you do not want indexed. On Webflow, you might block utility pages, thank-you pages, or draft sections that are technically published but not meant for search results.

Block AI crawlers

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

This blocks major AI crawlers while keeping search engines allowed. Order matters in robots.txt -- specific user-agent rules are checked before the wildcard rule. For the full list of AI crawlers, see our search engine bots list.

Block SEO tool crawlers

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Some site owners block SEO tool crawlers to prevent competitor analysis. This does not affect search rankings.

Webflow's Automatic Sitemap

Webflow generates a sitemap at /sitemap.xml that is referenced in the robots.txt. The sitemap includes all published pages and CMS collection items.

What the sitemap includes

  • All static pages (Home, About, Contact, etc.)
  • All published CMS collection items (Blog posts, Products, etc.)
  • The lastmod date for each URL (based on when the page was last published)

What the sitemap does not include

  • Draft pages
  • Pages with the "Exclude from sitemap" option enabled
  • Utility pages (404, password protection)
  • URLs from pagination (collection list pagination pages)

Excluding specific pages from the sitemap

In the Webflow Designer, select a page, open Page Settings, and check "Exclude this page from sitemap." This removes the page from sitemap.xml but does not add a noindex tag. If you want to prevent indexing entirely, also add a noindex meta tag in the page's custom code settings.

Webflow-Specific Crawling Considerations

CMS collection pages

Webflow's CMS generates pages from templates. If you have a Blog collection with 100 posts, Webflow creates 100 pages from the blog post template. All of these are included in the sitemap and crawlable by default.

If some collection items should not be indexed (like internal notes or draft-quality content that you have published for review), exclude them from the sitemap and add noindex tags.

Webflow's CDN

Webflow serves all sites through its CDN (powered by Fastly and Amazon CloudFront). This means:

  • Robots.txt is served from the CDN, not a traditional origin server
  • The file is cached and served quickly to crawlers
  • Changes propagate after publishing (CDN cache invalidation)

Form submission pages

Webflow form submissions generate success states on the same page (the form is replaced with a success message). These are not separate URLs, so there is nothing to block in robots.txt. However, if you redirect to a separate thank-you page after form submission, consider blocking that page:

Disallow: /thank-you

Thank-you pages typically have thin content and no SEO value.

Password-protected pages

Pages behind Webflow's password protection are served with a login gate. Crawlers cannot get past the gate, so these pages are effectively blocked regardless of robots.txt. However, adding them to your robots.txt Disallow list is still a good practice for clarity.

Utility pages

Webflow has built-in utility pages for 404 errors and password protection. These are not included in the sitemap and are handled correctly by default.

Client billing and staging

If you are on a Webflow workspace plan and your site has both a staging URL (yoursitename.webflow.io) and a production URL, make sure you are editing robots.txt for the correct domain. The staging URL should block crawlers (and does so by default).

Test after every change

After editing your robots.txt in Webflow, publish the site and then verify the changes by visiting yourdomain.com/robots.txt in your browser. Use a robots.txt testing tool to confirm your rules work as intended before relying on them. See our robots.txt testing guide.

Noindex vs. Disallow on Webflow

Webflow supports both approaches for keeping pages out of search results.

robots.txt Disallow prevents crawling. The page is not fetched by the crawler at all. But if other sites link to the page, Google may still show the URL in search results (without a snippet).

noindex meta tag allows crawling but prevents indexing. Google fetches the page, sees the noindex tag, and does not include it in search results.

For most purposes on Webflow, noindex is the better choice for hiding specific pages. Use robots.txt Disallow for directories that should never be crawled (admin areas, API endpoints, staging content).

To add a noindex tag to a Webflow page:

  1. Select the page in the Pages panel
  2. Open Page Settings
  3. In the "Custom Code" section, add to the <head>:
<meta name="robots" content="noindex, nofollow">

For the complete comparison, see robots.txt vs. meta robots.

Common Mistakes

Leaving Disallow: / after connecting a custom domain

If you manually edited your robots.txt to block crawlers during development and forgot to remove the block after launch, your entire site is invisible to search engines. Always check robots.txt after going live.

Blocking CSS and JavaScript

Webflow relies on JavaScript and CSS for layout and interactive elements. If you accidentally block paths that serve these resources, Googlebot cannot render your pages correctly. On Webflow, this is unlikely since the platform manages resource paths, but be careful with broad wildcard rules.

Not publishing after changes

Robots.txt edits in Webflow only take effect after you publish the site. If you edit the file but do not publish, the old version remains live.

Forgetting the Sitemap line

If you replace the entire robots.txt content with custom rules, make sure to include the Sitemap directive:

Sitemap: https://yourdomain.com/sitemap.xml

Without it, crawlers lose the direct reference to your sitemap (though they can still find it via Search Console submissions).

Summary

Webflow provides editable robots.txt through its project settings, giving you full control over crawler access. The defaults are sensible (allow everything, reference the sitemap). Customize as needed to block AI crawlers, utility pages, or specific directories. Always publish after editing, verify changes by checking the live robots.txt file, and use noindex meta tags when you want to prevent indexing of specific pages rather than preventing crawling entirely.

Test your Webflow robots.txt

Verify that your robots.txt rules are working correctly and search engines can reach your content.

Test Your robots.txt