robots.txt vs X-Robots-Tag: The HTTP Header Approach

There are three ways to tell search engines what to do with your content: robots.txt, the meta robots tag, and the X-Robots-Tag HTTP header. Most people know the first two. The third is less well known but solves problems that the other two cannot.

robots.txt controls crawling. Meta robots controls indexing for HTML pages. X-Robots-Tag controls indexing for anything that has an HTTP response, including PDFs, images, video files, and API responses. If you need to prevent Google from indexing a PDF or an image file, X-Robots-Tag is your only option short of blocking the file with robots.txt entirely (which has its own drawbacks).

This guide covers what X-Robots-Tag is, how it compares to robots.txt and meta robots, when to use each, and how to implement it on common web servers and CDNs. For background on robots.txt itself, see the robots.txt Guide.

What Is X-Robots-Tag?

X-Robots-Tag is an HTTP response header that tells search engines how to handle a specific resource. It works the same as the <meta name="robots"> HTML tag, but it lives in the HTTP headers instead of the HTML body.

Here is what it looks like in an HTTP response:

HTTP/1.1 200 OK
Content-Type: application/pdf
X-Robots-Tag: noindex, nofollow

When a search engine crawler fetches this resource, it reads the X-Robots-Tag header and obeys the directives. In this example, the crawler will not index the PDF and will not follow any links within it.

The directives available in X-Robots-Tag are the same ones you would use in a meta robots tag:

noindex -- Do not add this resource to the search index.
nofollow -- Do not follow links found in this resource.
noarchive -- Do not show a cached copy of this resource in search results.
nosnippet -- Do not show a text snippet or video preview for this resource in search results.
noimageindex -- Do not index images on this page.
max-snippet:[number] -- Limit the text snippet to this many characters.
unavailable_after:[date] -- Stop showing this resource in search results after the specified date.

How X-Robots-Tag Differs from robots.txt

robots.txt and X-Robots-Tag do fundamentally different things. Understanding this distinction prevents a common category of SEO mistakes.

robots.txt controls crawling. It tells bots whether they are allowed to fetch a URL. If you Disallow a URL in robots.txt, the crawler never requests it. It never sees the content. But here is the important part: if another page links to that URL, Google may still index the URL itself (without the content). You end up with a search result that says "No information is available for this page" -- which is often worse than no result at all.

X-Robots-Tag controls indexing. It tells bots what to do after they have already fetched the resource. The crawler visits the URL, reads the X-Robots-Tag header, and then decides whether to add the content to its index. The content is fetched but not indexed.

This difference matters in practice. If you want a page completely absent from search results, you need noindex (via meta robots or X-Robots-Tag), not just a Disallow in robots.txt. And if you use robots.txt to block a page that has a noindex tag, the crawler will never see the noindex tag, so the page may remain indexed.

For a deeper look at the robots.txt and noindex interaction, see Noindex in robots.txt: Why It Doesn't Work.

How X-Robots-Tag Differs from Meta Robots

X-Robots-Tag and meta robots serve the same purpose but differ in scope and flexibility.

Meta robots only works in HTML. The <meta name="robots" content="noindex"> tag must be placed inside the <head> section of an HTML document. It cannot control indexing of PDFs, images, JavaScript files, CSS files, XML feeds, or any other non-HTML resource.

X-Robots-Tag works for any HTTP response. Since it is an HTTP header, it applies to anything served over HTTP. This is its primary advantage. A PDF does not have a <head> element where you could place a meta tag. An image file has no HTML at all. X-Robots-Tag is the only way to apply noindex to these resources.

For HTML pages, you can use either method. They are functionally identical. If both are present, the most restrictive combination applies. A meta robots tag with noindex and an X-Robots-Tag with nofollow would result in noindex, nofollow for that page.

The comparison with meta robots is covered in more detail in robots.txt vs meta robots.

When to Use Each

Use robots.txt when you want to prevent crawling entirely. This saves your crawl budget and keeps bots away from pages that serve no purpose for search engines (admin panels, staging areas, API endpoints). See the robots.txt and SEO guide for more on crawl budget management.

Use meta robots when you want to prevent indexing of HTML pages and you have access to the page's HTML. This is the simplest approach for most websites.

Use X-Robots-Tag when:

You need to prevent indexing of non-HTML resources (PDFs, images, feeds).
You do not have access to edit the HTML of the pages you want to control.
You want to manage indexing rules at the server or CDN level rather than in page templates.
You need to apply the same directive to hundreds or thousands of resources matching a URL pattern.

In practice, many sites use all three together. robots.txt blocks paths that should never be crawled. Meta robots handles per-page indexing decisions in the HTML. X-Robots-Tag covers non-HTML resources and provides server-level overrides.

Bot-Specific Directives

Both meta robots and X-Robots-Tag support targeting specific bots. You can give different instructions to different crawlers.

In meta robots:

<meta name="googlebot" content="noindex">
<meta name="bingbot" content="noarchive">

In X-Robots-Tag:

X-Robots-Tag: googlebot: noindex
X-Robots-Tag: bingbot: noarchive

This is useful if you want to remain indexed in Bing but not Google, or if you want to block AI crawlers specifically while allowing traditional search engine indexing.

X-Robots-Tag: GPTBot: noindex
X-Robots-Tag: ClaudeBot: noindex

Implementing X-Robots-Tag on Nginx

Add the header in your server or location block:

# Noindex all PDFs
location ~* \.pdf$ {
    add_header X-Robots-Tag "noindex, nofollow" always;
}

# Noindex a specific directory
location /internal/ {
    add_header X-Robots-Tag "noindex" always;
}

# Noindex all images in a directory
location /uploads/ {
    add_header X-Robots-Tag "noimageindex" always;
}

The always parameter ensures the header is sent regardless of the response status code. Without it, Nginx only adds the header on 2xx responses.

Verify your X-Robots-Tag alongside robots.txt

Test how your robots.txt rules interact with your indexing directives to make sure nothing important is blocked.

Test Your robots.txt

Implementing X-Robots-Tag on Apache

Use the Header directive in your .htaccess or virtual host configuration:

# Noindex all PDFs
<FilesMatch "\.pdf$">
    Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

# Noindex a specific directory
<Directory "/var/www/html/internal">
    Header set X-Robots-Tag "noindex"
</Directory>

# Noindex all images
<FilesMatch "\.(jpg|jpeg|png|gif|webp)$">
    Header set X-Robots-Tag "noimageindex"
</FilesMatch>

Make sure mod_headers is enabled on your Apache server. Run a2enmod headers and restart Apache if the Header directive is not recognized.

Implementing X-Robots-Tag on CDNs

Most modern CDNs let you add custom response headers based on URL patterns.

Cloudflare -- Use Transform Rules to add response headers. Go to Rules > Transform Rules > Modify Response Header. Set a rule to match your desired URL pattern and add the X-Robots-Tag header.

Netlify -- Add headers in your netlify.toml or _headers file:

[[headers]]
  for = "/documents/*"
  [headers.values]
    X-Robots-Tag = "noindex"

Vercel -- Add headers in your vercel.json:

{
  "headers": [
    {
      "source": "/internal/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noindex, nofollow"
        }
      ]
    }
  ]
}

AWS CloudFront -- Use response headers policies or Lambda@Edge functions to add the X-Robots-Tag header to specific URL patterns.

Common Patterns

Noindex All PDFs

PDFs often contain duplicate content (a PDF version of a web page) or internal documents that should not appear in search results.

X-Robots-Tag: noindex, nofollow

Apply this to all .pdf responses using a URL pattern match on your server or CDN.

Noindex Paginated Pages

If you have paginated content (page 2, page 3, etc.) that you want to keep crawlable but not indexed:

location ~* [?&]page=[2-9] {
    add_header X-Robots-Tag "noindex, follow" always;
}

The follow directive means crawlers will still follow links on the page, which helps them discover content. They just will not index the paginated page itself.

Noindex Staging Environments

Apply X-Robots-Tag at the server level for your entire staging site:

server {
    server_name staging.example.com;
    add_header X-Robots-Tag "noindex, nofollow" always;
    # ... rest of config
}

Expire Content After a Date

The unavailable_after directive tells crawlers to stop showing a resource after a specific date. Useful for time-limited promotions or events:

X-Robots-Tag: unavailable_after: 25 Jun 2026 15:00:00 EST

Debugging X-Robots-Tag

To verify your X-Robots-Tag is being sent correctly, use curl:

curl -I https://example.com/document.pdf

Look for the X-Robots-Tag header in the response. If it is not there, check your server configuration, make sure the correct module is enabled, and verify that the URL pattern matches.

In Chrome DevTools, open the Network tab, select a request, and check the Response Headers section. The X-Robots-Tag header should appear alongside other response headers.

Google Search Console does not have a dedicated X-Robots-Tag report, but you can check how Google sees your pages using the URL Inspection tool. If a page is marked as "Excluded - noindex," verify whether the noindex is coming from a meta tag or an X-Robots-Tag header.

Common Mistakes

Blocking crawling in robots.txt and expecting X-Robots-Tag to work. If robots.txt prevents the crawler from fetching a URL, the crawler never sees the X-Robots-Tag header on that response. The two must be used consistently. Do not block a URL in robots.txt if you are relying on X-Robots-Tag to control its indexing.

Forgetting the header on redirects. If a URL redirects (301 or 302), the X-Robots-Tag on the redirect response is typically ignored. The tag on the final destination response is what matters. Make sure the destination URL has the correct X-Robots-Tag.

Conflicting directives. If a meta robots tag says index and an X-Robots-Tag says noindex, the most restrictive directive wins. The page will not be indexed. Be aware of this when combining approaches. For guidance on using robots.txt syntax correctly alongside these headers, see the robots.txt syntax reference.

X-Robots-Tag requires crawling

For X-Robots-Tag to work, crawlers must be able to fetch the resource. If you block the URL in robots.txt, the crawler never sees the header. Make sure your robots.txt allows access to any URL where you rely on X-Robots-Tag directives.

References

Test your robots.txt for free

Validate your robots.txt file instantly. Check directives, find crawling issues, and ensure search engines can access your site.