robots.txt Testing for E-Commerce Sites

Ensure search engines can crawl your product pages. Test your e-commerce robots.txt for issues that block indexing and cost you sales.

You added 200 new products last month. Your team wrote descriptions, uploaded images, and optimized the titles. Three weeks later, not a single one appears in Google. The product pages exist, the sitemap is updated, but search engines are not crawling them. The culprit is buried in your robots.txt: a rule blocking faceted navigation URLs that is also catching every product page with a filter parameter in the URL.

For e-commerce sites, robots.txt mistakes do not just hurt rankings. They cost sales. Every product page that search engines cannot find is a product customers cannot discover through organic search.

E-commerce-specific robots.txt challenges

Online stores have URL structures that are uniquely difficult to manage in robots.txt. The same features that make your site easy to browse for customers create headaches for crawl management.

Faceted navigation

Color filters, size selectors, price ranges, brand filters -- each combination generates a unique URL. A store with 50 products and 10 filter options can produce thousands of filtered URLs. Without robots.txt rules to manage this, crawlers waste their entire budget on filter combinations instead of your actual product pages.

Pagination

Category pages with hundreds of products create long pagination chains. Pages 2 through 50 of a category listing are important for crawl discovery but can overwhelm your crawl budget if not managed properly.

Internal search results

Your site search generates URLs like /search?q=blue+shoes. These pages are useful for users but are low-value for search engines. Left unblocked, crawlers can get trapped in an infinite loop of search queries.

Session IDs and tracking parameters

URLs with session tokens, tracking parameters, or cart identifiers create duplicate content that dilutes your crawl budget. Parameters like ?sid=, ?utm_, and ?ref= multiply the number of URLs crawlers see without adding any unique content.

Sorted and filtered URLs

Sort-by-price, sort-by-newest, sort-by-rating -- each sort order creates a different URL for the same product listing. Without blocking these variations, you have multiple URLs competing against each other for the same search queries.

The balancing act: crawl budget vs. product discovery

The core challenge for e-commerce robots.txt is balance. Block too little and crawlers waste their budget on low-value filter and sort URLs. Block too much and your product pages disappear from search results.

Too permissiveToo restrictive
Crawlers index thousands of filter combinationsProduct category pages are blocked from indexing
Crawl budget wasted on sort variationsNew product pages never get discovered
Duplicate content from parameter URLsPagination blocked, deep products unreachable
Internal search results indexedFiltered views that drive traffic are also blocked

Getting this right requires testing. You need to know exactly which URLs your rules allow and which they block.

Test your e-commerce robots.txt rules

Paste your robots.txt and test product URLs, filter pages, and category paths to see exactly what crawlers can access.

Platform defaults: what Shopify and WooCommerce give you

If you run an e-commerce platform, your robots.txt may have been generated for you. Understanding what you start with matters.

Shopify generates a robots.txt automatically and until recently did not allow merchants to edit it. Shopify's default blocks internal search, checkout pages, cart pages, and admin areas. It is a reasonable starting point, but it does not account for your specific product structure, custom collections, or faceted navigation apps.

WooCommerce on WordPress does not generate a robots.txt by default. WordPress creates a virtual robots.txt with minimal rules. Many WooCommerce stores rely on SEO plugins like Yoast or Rank Math to manage their robots.txt, which means the rules depend entirely on plugin configuration.

Magento/Adobe Commerce generates a default robots.txt during installation that blocks common admin and system paths. However, it does not address faceted navigation or layered navigation URLs, which are often the biggest crawl budget issue for Magento stores.

Custom platforms have no defaults at all. If your store runs on a custom framework, you are starting from scratch.

Do not assume platform defaults are correct for your store

Platform defaults are generic. They do not know your URL structure, your faceted navigation setup, or your content strategy. Always validate the robots.txt your platform generates against your actual site URLs.

Using Robots.txt Tester for your store

1

Fetch your current robots.txt

Start by checking what you currently have. Paste your live robots.txt into the tester or fetch it from your store URL. See every directive and identify any syntax issues.

2

Test your product page URLs

Enter your actual product URLs and confirm they are allowed. Test canonical product URLs, variant URLs, and any URLs with parameters your store generates.

3

Test your filter and sort URLs

Enter the URLs your faceted navigation creates. Confirm that low-value filter combinations are blocked while important category paths remain accessible.

4

Verify after platform updates

Every time your e-commerce platform updates, check the robots.txt. Shopify, WooCommerce plugins, and Magento updates can all modify your file. A quick test after each update confirms nothing changed unexpectedly.

5

Test seasonal and campaign pages

Before a sale, product launch, or seasonal campaign, verify that the landing pages and product collections you are promoting are crawlable. Blocked campaign pages mean wasted marketing spend.

A practical example

Here is a common e-commerce robots.txt pattern and the testing you should do around it:

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*&sid=
Allow: /collections/
Allow: /products/

Sitemap: https://yourstore.com/sitemap.xml

After writing these rules, test the following URLs in Robots.txt Tester:

  • /products/blue-running-shoes -- should be allowed
  • /collections/mens-shoes -- should be allowed
  • /collections/mens-shoes?sort=price-asc -- should be blocked
  • /collections/mens-shoes?filter=color-blue -- should be blocked
  • /search?q=running+shoes -- should be blocked
  • /cart/ -- should be blocked

If any of these return unexpected results, you know your rules need adjustment before they go live.

Validate your store's robots.txt now

Test product URLs, filter pages, and category paths against your rules. See exactly what Google can and cannot crawl.

Pricing

Robots.txt Tester is free. Test your store's robots.txt as often as you need -- after platform updates, before product launches, during seasonal campaigns.

Free

$0

  • Up to 3 items
  • Email alerts
  • Basic support

Pro

$9/month

  • Unlimited items
  • Email + Slack alerts
  • Priority support
  • API access

Part of Boring Tools -- boring tools for boring jobs.

Test your robots.txt for free

Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.