robots.txt Testing for E-Commerce Sites
Ensure search engines can crawl your product pages. Test your e-commerce robots.txt for issues that block indexing and cost you sales.
You added 200 new products last month. Your team wrote descriptions, uploaded images, and optimized the titles. Three weeks later, not a single one appears in Google. The product pages exist, the sitemap is updated, but search engines are not crawling them. The culprit is buried in your robots.txt: a rule blocking faceted navigation URLs that is also catching every product page with a filter parameter in the URL.
For e-commerce sites, robots.txt mistakes do not just hurt rankings. They cost sales. Every product page that search engines cannot find is a product customers cannot discover through organic search.
E-commerce-specific robots.txt challenges
Online stores have URL structures that are uniquely difficult to manage in robots.txt. The same features that make your site easy to browse for customers create headaches for crawl management.
Faceted navigation
Color filters, size selectors, price ranges, brand filters -- each combination generates a unique URL. A store with 50 products and 10 filter options can produce thousands of filtered URLs. Without robots.txt rules to manage this, crawlers waste their entire budget on filter combinations instead of your actual product pages.
Pagination
Category pages with hundreds of products create long pagination chains. Pages 2 through 50 of a category listing are important for crawl discovery but can overwhelm your crawl budget if not managed properly.
Internal search results
Your site search generates URLs like /search?q=blue+shoes. These pages are useful for users but are low-value for search engines. Left unblocked, crawlers can get trapped in an infinite loop of search queries.
Session IDs and tracking parameters
URLs with session tokens, tracking parameters, or cart identifiers create duplicate content that dilutes your crawl budget. Parameters like ?sid=, ?utm_, and ?ref= multiply the number of URLs crawlers see without adding any unique content.
Sorted and filtered URLs
Sort-by-price, sort-by-newest, sort-by-rating -- each sort order creates a different URL for the same product listing. Without blocking these variations, you have multiple URLs competing against each other for the same search queries.
The balancing act: crawl budget vs. product discovery
The core challenge for e-commerce robots.txt is balance. Block too little and crawlers waste their budget on low-value filter and sort URLs. Block too much and your product pages disappear from search results.
| Too permissive | Too restrictive |
|---|---|
| Crawlers index thousands of filter combinations | Product category pages are blocked from indexing |
| Crawl budget wasted on sort variations | New product pages never get discovered |
| Duplicate content from parameter URLs | Pagination blocked, deep products unreachable |
| Internal search results indexed | Filtered views that drive traffic are also blocked |
Getting this right requires testing. You need to know exactly which URLs your rules allow and which they block.
Test your e-commerce robots.txt rules
Paste your robots.txt and test product URLs, filter pages, and category paths to see exactly what crawlers can access.
Platform defaults: what Shopify and WooCommerce give you
If you run an e-commerce platform, your robots.txt may have been generated for you. Understanding what you start with matters.
Shopify generates a robots.txt automatically and until recently did not allow merchants to edit it. Shopify's default blocks internal search, checkout pages, cart pages, and admin areas. It is a reasonable starting point, but it does not account for your specific product structure, custom collections, or faceted navigation apps.
WooCommerce on WordPress does not generate a robots.txt by default. WordPress creates a virtual robots.txt with minimal rules. Many WooCommerce stores rely on SEO plugins like Yoast or Rank Math to manage their robots.txt, which means the rules depend entirely on plugin configuration.
Magento/Adobe Commerce generates a default robots.txt during installation that blocks common admin and system paths. However, it does not address faceted navigation or layered navigation URLs, which are often the biggest crawl budget issue for Magento stores.
Custom platforms have no defaults at all. If your store runs on a custom framework, you are starting from scratch.
Do not assume platform defaults are correct for your store
Platform defaults are generic. They do not know your URL structure, your faceted navigation setup, or your content strategy. Always validate the robots.txt your platform generates against your actual site URLs.
Using Robots.txt Tester for your store
Fetch your current robots.txt
Start by checking what you currently have. Paste your live robots.txt into the tester or fetch it from your store URL. See every directive and identify any syntax issues.
Test your product page URLs
Enter your actual product URLs and confirm they are allowed. Test canonical product URLs, variant URLs, and any URLs with parameters your store generates.
Test your filter and sort URLs
Enter the URLs your faceted navigation creates. Confirm that low-value filter combinations are blocked while important category paths remain accessible.
Verify after platform updates
Every time your e-commerce platform updates, check the robots.txt. Shopify, WooCommerce plugins, and Magento updates can all modify your file. A quick test after each update confirms nothing changed unexpectedly.
Test seasonal and campaign pages
Before a sale, product launch, or seasonal campaign, verify that the landing pages and product collections you are promoting are crawlable. Blocked campaign pages mean wasted marketing spend.
A practical example
Here is a common e-commerce robots.txt pattern and the testing you should do around it:
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*&sid=
Allow: /collections/
Allow: /products/
Sitemap: https://yourstore.com/sitemap.xml
After writing these rules, test the following URLs in Robots.txt Tester:
/products/blue-running-shoes-- should be allowed/collections/mens-shoes-- should be allowed/collections/mens-shoes?sort=price-asc-- should be blocked/collections/mens-shoes?filter=color-blue-- should be blocked/search?q=running+shoes-- should be blocked/cart/-- should be blocked
If any of these return unexpected results, you know your rules need adjustment before they go live.
Validate your store's robots.txt now
Test product URLs, filter pages, and category paths against your rules. See exactly what Google can and cannot crawl.
Pricing
Robots.txt Tester is free. Test your store's robots.txt as often as you need -- after platform updates, before product launches, during seasonal campaigns.
Free
$0
- Up to 3 items
- Email alerts
- Basic support
Pro
$9/month
- Unlimited items
- Email + Slack alerts
- Priority support
- API access
Related Articles
Part of Boring Tools -- boring tools for boring jobs.
Test your robots.txt for free
Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.