Major AI Crawler User Agent Headers
Company | Bot Name | User Agent String | Purpose |
---|---|---|---|
OpenAI | GPTBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot) |
Used by OpenAI to train and refine generative AI models |
OpenAI | ChatGPT-User | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot) |
Indexes online content to advance ChatGPT’s research and retrieval |
OpenAI | OAI-SearchBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot) |
Used to create an index of websites that can be surfaced as results in OpenAI’s SearchGPT product |
Anthropic | anthropic-ai | Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html) |
Collects information for Anthropic’s AI development |
Anthropic | ClaudeBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com) |
A web crawler operated by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude |
Anthropic | Claude-Web | Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html) |
Acquires site data to refine Anthropic’s web-focused models |
Google-Extended | Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html) |
Used to help improve Bard (now Gemini) and Vertex AI generative APIs, including future generations of models | |
GoogleOther | GoogleOther |
Used by Google for internal research and development | |
Apple | Applebot | Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html) |
Crawls webpages to improve results for Siri and Spotlight |
Apple | Applebot-Extended | Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html) |
Determines how to use data crawled by Applebot for Apple’s foundation models |
Microsoft | BingBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36 |
Microsoft’s web crawler for Bing search engine |
Meta | FacebookBot | Mozilla/5.0 (compatible; FacebookBot/1.0; +http://www.facebook.com/bot.html) |
Fetches content for Facebook and other Meta services |
Meta | Meta-ExternalAgent | Mozilla/5.0 (compatible; meta-externalagent/1.1; +https://developers.facebook.com/docs/sharing/webmasters/crawler) |
Crawls the web for use cases such as training AI models or improving products by indexing content directly |
ByteDance | Bytespider | Mozilla/5.0 (compatible; Bytespider/1.0; +http://www.bytedance.com/bot.html) |
Surveys webpages to support TikTok’s content discovery |
Cohere | cohere-ai | Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html) |
Collects text samples to refine Cohere’s language models |
Perplexity | PerplexityBot | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) |
Perplexity’s crawler, designed to help the platform build and maintain its own index |
Mistral AI | MistralAI-User | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots) |
Used by Mistral to fetch citations for Le Chat. It doesn’t crawl the web automatically nor collect training data |
Common Crawl | CCBot | Mozilla/5.0 (compatible; CCBot/1.0; +http://www.commoncrawl.org/bot.html) |
Non-profit organization CommonCrawl’s user agent devoted to cataloging the Internet |
Diffbot | Diffbot | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com) |
Scrapes webpages to produce structured data for AI systems |
DuckDuckGo | DuckAssistBot | Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html) |
Collects data to deliver AI-backed answers on DuckDuckGo |
This table includes the major AI crawlers from companies like OpenAI, Anthropic, Google, Apple, Microsoft, Meta, and others. Each entry includes the company name, bot name, full user agent string, and the primary purpose of the crawler.