Meta - who operate the Instagram, WhatsApp, and Facebook brands - use crawlers for many different functions, including training Artificial Intelligence (AI).
A blanket approach of disallowing all Meta crawler user-agents might mean missing out on being discovered within some of their services. A more nuanced approached is explained in this blog.
Meta describes the product tokens, user-agents, and usage purposes in their documentation. We consult this documentation alongside other research sources when cataloging Meta’s crawlers for usages.
AI Crawlers
Meta-ExternalAgents
Meta's primary crawler for gathering data for AI uses is Meta-ExternalAgent. We've identified that this crawler is used for AI Indexing and Training.
The full primary User-Agent is:
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Check out the full results in the User-Agent tester.
You'll notice the IsArtificialIntelligence property result is True indicating that one or more AI related crawler uses is associated with this crawler.
Meta-WebIndexer
This crawler is used purely for indexing Meta AI search results and uses the product token Meta-WebIndexer.
Meta notes that allowing this crawler to access content will support Meta AI citing and linking to your content in the AI responses.
The main User-Agent we've seen for this crawler is:
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
See the User-Agent tester for all the details including IsArtificialIntelligence which returns True.
The CrawlerUsage property returns finer grained values so you can decide whether to allow crawlers to index your content or not. You can see a full list of usages in the developer documentation.
Non-AI
Meta operates User-Agents for uses other than AI.
Meta-ExternalFetcher
The Meta-ExternalFetcher token refers only to user-prompted actions and therefore doesn't qualify as a crawler because it's only ever used when a user is directly viewing the results. It will still show up frequently as the original user's User-Agent is replaced by Meta's'.
The most popular User-Agent we've seen from Meta-ExternalFetcher is:
meta-externalfetcher/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
We'll indicate that this shouldn't be treated as a crawler using the IsCrawler property, which in this case returns False.
Meta-ExternalAds
The Meta-ExternalAds product token is categorized as an Analytics crawler. Meta states that it is used to improve advertising and products.
We see this popular User-Agent for the crawler:
meta-externalads/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
FacebookExternalHit
The FacebookExternalHit crawler reviews the content that was shared on one of Meta's apps. This allows a preview to be shown based on content including title, description and thumbnail image. You'll see we classify this as usage type Preview.
The common User-Agent for this crawler is:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Robots.txt
Configure your robots.txt for 368 crawlers, including Meta's, automatically using 51Degrees free service.
AI Treatment
Want to know more about how to handle AIs including options that go beyond robots.txt? Check out our guide to the good, the bad, and the ugly.