Complete List of Search Engine and AI Bot User-Agents

A comprehensive reference of search engine crawler and AI bot user-agent strings. Covers Google, Bing, Yandex, Baidu, DuckDuckGo, and major AI crawlers with their robots.txt identifiers.

When you configure your robots.txt file, you need to know the exact user-agent strings that crawlers use. A typo in the user-agent name means your rules will not apply to the intended bot. This reference lists every major search engine crawler and AI bot, their user-agent strings, and what they crawl for.

Google Crawlers

Google operates multiple crawlers, each with a specific purpose. You can target them individually or collectively in robots.txt.

Primary Web Crawlers

| User-Agent (robots.txt) | Full User-Agent String | Purpose | |---|---|---| | Googlebot | Mozilla/5.0 (Linux; Android 6.0.1; ...) AppleWebKit/537.36 ... Googlebot | Primary web crawler (smartphone) | | Googlebot | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | Primary web crawler (desktop) |

In robots.txt, both the mobile and desktop Googlebot are targeted with User-agent: Googlebot. You cannot separately target mobile vs. desktop Googlebot via robots.txt.

Specialized Google Crawlers

| User-Agent (robots.txt) | Purpose | |---|---| | Googlebot-Image | Crawls images for Google Images | | Googlebot-News | Crawls content for Google News | | Googlebot-Video | Crawls video content | | Storebot-Google | Crawls product pages for Google Shopping | | Google-InspectionTool | Used by Search Console URL Inspection (live test) |

Google Ads Crawlers

| User-Agent (robots.txt) | Purpose | |---|---| | AdsBot-Google | Checks desktop landing page quality for Google Ads | | AdsBot-Google-Mobile | Checks mobile landing page quality for Google Ads | | Mediapartners-Google | Crawls pages for AdSense content matching |

Note: AdsBot crawlers do not follow wildcard robots.txt rules the same way Googlebot does. If you run Google Ads, be careful about blocking AdsBot, as it can affect your ad quality scores.

Google AI and Special Crawlers

| User-Agent (robots.txt) | Purpose | |---|---| | Google-Extended | Crawls content for Google's AI products (Gemini, AI Overviews). Blocking this does not affect search indexing. | | GoogleOther | Generic crawler used for one-off crawls, R&D | | GoogleOther-Image | One-off image crawling | | GoogleOther-Video | One-off video crawling | | APIs-Google | Accesses content for Google APIs | | FeedFetcher-Google | Fetches RSS/Atom feeds for Google services |

For details on blocking AI crawlers, see our guide on how to block AI crawlers with robots.txt.

Microsoft / Bing Crawlers

| User-Agent (robots.txt) | Full User-Agent String | Purpose | |---|---|---| | bingbot | Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) | Primary Bing web crawler | | msnbot | msnbot/2.0b (+http://search.msn.com/msnbot.htm) | Legacy MSN crawler (still active) | | BingPreview | Mozilla/5.0 ... BingPreview/1.0b | Generates page previews for Bing | | MicrosoftPreview | Various | Generates link previews for Microsoft products | | adidxbot | Mozilla/5.0 (compatible; adidxbot/2.0; ...) | Bing advertising crawler |

Bing also powers Yahoo search results, so blocking Bingbot affects visibility in both Bing and Yahoo.

Bing respects crawl-delay

Unlike Google, Bing respects the crawl-delay directive in robots.txt:

User-agent: bingbot
Crawl-delay: 10

This tells Bingbot to wait 10 seconds between requests.

Yandex Crawlers

| User-Agent (robots.txt) | Purpose | |---|---| | YandexBot | Primary web crawler | | YandexImages | Image crawler | | YandexVideo | Video crawler | | YandexMedia | Media content crawler | | YandexBlogs | Blog content crawler | | YandexNews | News crawler | | YandexDirect | Advertising crawler | | YandexMetrika | Analytics verification | | YandexTurbo | Turbo pages crawler | | YandexRenderResourcesBot | Resource rendering |

Yandex is the dominant search engine in Russia. Like Bing, Yandex respects crawl-delay.

Baidu Crawlers

| User-Agent (robots.txt) | Purpose | |---|---| | Baiduspider | Primary web crawler | | Baiduspider-image | Image crawler | | Baiduspider-video | Video crawler | | Baiduspider-news | News crawler | | Baiduspider-render | Rendering crawler |

Baidu is the dominant search engine in China. It respects robots.txt, though historically it has been less strict about compliance than Google or Bing.

Other Search Engine Crawlers

| User-Agent (robots.txt) | Search Engine | Purpose | |---|---|---| | DuckDuckBot | DuckDuckGo | Web crawling (DuckDuckGo also uses Bing's index) | | Sogou web spider | Sogou | Chinese search engine | | Sogou inst spider | Sogou | Instant search | | SeznamBot | Seznam | Czech search engine | | NaverBot | Naver | Korean search engine | | Yeti | Naver | Naver's primary crawler | | Applebot | Apple / Siri | Apple search and Siri suggestions | | PetalBot | Huawei / Petal Search | Huawei's search engine | | Qwantify | Qwant | European privacy-focused search engine | | ia_archiver | Internet Archive | Web archiving (Wayback Machine) |

AI and LLM Crawlers

AI companies use web crawlers to gather training data and power retrieval-augmented generation (RAG) systems. These are a relatively new category and the landscape is changing rapidly.

| User-Agent (robots.txt) | Company | Purpose | |---|---|---| | GPTBot | OpenAI | Training data and ChatGPT web browsing | | ChatGPT-User | OpenAI | Real-time browsing for ChatGPT | | OAI-SearchBot | OpenAI | SearchGPT / ChatGPT search | | ClaudeBot | Anthropic | Training data collection | | anthropic-ai | Anthropic | Anthropic's web crawler | | Google-Extended | Google | AI training (Gemini, AI Overviews) | | CCBot | Common Crawl | Open dataset used by many AI companies | | cohere-ai | Cohere | AI training | | PerplexityBot | Perplexity | AI search engine | | Bytespider | ByteDance | TikTok / Douyin AI and search | | Amazonbot | Amazon | Alexa answers and AI services | | FacebookExternalHit | Meta | Link previews (not strictly AI, but used for Meta AI) | | meta-externalagent | Meta | Meta AI training | | Timpibot | Timpi | Decentralized search |

Blocking AI crawlers

To block all major AI crawlers:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

For a full walkthrough, see our blocking AI crawlers guide.

Social Media Crawlers

These bots fetch your pages to generate link previews. They are not search engine crawlers, but they appear in server logs and can be controlled with robots.txt (though blocking them prevents link previews from working).

| User-Agent | Platform | Purpose | |---|---|---| | facebookexternalhit | Facebook / Meta | Link preview generation | | Twitterbot | X (Twitter) | Link preview generation | | LinkedInBot | LinkedIn | Link preview generation | | Slackbot | Slack | Link preview generation | | WhatsApp | WhatsApp | Link preview generation | | Discordbot | Discord | Link preview generation | | TelegramBot | Telegram | Link preview generation |

Generally, you should allow these bots access so your links display correctly when shared.

SEO Tool Crawlers

| User-Agent | Tool | Purpose | |---|---|---| | AhrefsBot | Ahrefs | Backlink and SEO analysis | | SemrushBot | Semrush | SEO analysis | | rogerbot | Moz | Link analysis | | DotBot | Moz | Link analysis | | MJ12bot | Majestic | Backlink analysis | | BLEXBot | Webmeister | SEO analysis | | DataForSeoBot | DataForSEO | SEO data collection |

Some sites block SEO tool crawlers to prevent competitors from analyzing their backlink profiles. This is a judgment call -- blocking them does not affect search rankings but limits third-party analysis of your site.

User-agent strings change

Bot user-agent strings are updated periodically. Google has changed Googlebot's user-agent format multiple times. AI crawlers are especially new and may change their identifiers. Check this list periodically or verify against the crawler's official documentation.

How to Identify Bots in Your Server Logs

Filter your server access logs by user-agent string to see which bots visit your site. Common log analysis commands:

# Count requests by bot
grep -i "bot\|spider\|crawl" access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -rn

For verifying that a bot is genuine (not a scraper impersonating a known bot), use reverse DNS verification. See our Googlebot explained guide for the verification process.

robots.txt Best Practices for Bot Management

  1. Allow legitimate search engine crawlers. Do not block Googlebot, bingbot, or other search crawlers unless you have a specific reason.
  2. Block unwanted bots explicitly. Target them by their user-agent name.
  3. Test your rules. Use a robots.txt testing tool to verify your rules affect the intended bots. See our robots.txt testing guide.
  4. Monitor bot traffic. Check server logs regularly for new or unexpected bot activity.
  5. Keep your robots.txt updated. As new AI crawlers emerge, decide whether to allow or block them and update your rules.

Summary

Search engine crawlers, AI bots, social media crawlers, and SEO tools all identify themselves through user-agent strings. Knowing the correct string for each bot is essential for writing accurate robots.txt rules. Focus on allowing the search engines that matter for your traffic, make deliberate decisions about AI crawlers, and monitor your server logs for unexpected bot activity.

Test your robots.txt rules against any bot

Enter a user-agent and URL to see if your robots.txt allows or blocks access.

Test Your robots.txt