User Agent Client Hints

An introduction toUser Agent Client Hints by Bruce Lawson

A web developer with a love for music, art & literature, and of course, the web! Bruce was one of the editors of the HTML5 specification at W3C as well as one of the committee that drafted the British Standard for commissioning accessible websites. A true web whiz that has written and presented at over 120 events in 22 countries! When he’s not blogging or tweeting nonsense, you can catch him relishing a pint of Guinness in Birmingham UK.

Every time a web browser requests information from a server — whether that is for a web page, an image, a video, or a download — the browser (“client”) and the server exchange meta-information in headers. The header sent from the browser is called a request header; those sent from the server are response headers.

These can easily be viewed in browser developer tools: in Chrome, open devtools, and choose the Network tab. Reload the page (if necessary) and click any of the resources on the left; the associated headers will be shown in the right-hand pane.

Chrome Response Headers
Chrome Response Headers

Along with the web page that you actually see, there's a lot of invisible information going on behind the scenes (and that’s not even counting cookies!). The purpose of this article isn’t to look at all possible permutations of data sent in HTTP headers (see MDN's HTTP headers article for that). In this article, we’ll concentrate on learning about Client Hints.

Client Hints explained

Client Hints are a relatively new addition to the HTTP header zoo. They were proposed in July 2018 by Ilya Grigorik, a performance engineer at Google.

Client Hints are a way of giving the server some information about the requesting device so that the server may do content negotiation — that is, use the information given in a hint to tailor what is returned.

Content negotiation isn’t a new concept. Since dinosaurs roamed the earth, we’ve been able to say to the server:

accept-encoding: gzip, deflate

This tells the server that we’ll accept data compressed by an algorithm called gzip, and the server can then reply with compressed content and response HTTP header:

Content-Encoding: gzip

This tells the client “you said you would accept gzipped content, so here it is, gzipped for you!”. Note that this is a hint, not a command. We might send the following header to indicate to the server that we’d prefer the more modern (and more efficient) Brotli compression method:

accept-encoding: br, gzip, deflate

If the server isn’t set up for Brotli compression, it can see that we will nevertheless accept gzip, and send the content compressed using this method, with the appropriate response content-encoding HTTP header set to indicate the type of data sent.

A more recent example of content negotiation through headers is serving next-generation image formats to browsers that understand them, without having to recode a site to use <picture> elements. In order to achieve that, Chrome’s 'accept' HTTP header is:

accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9

The image/avif part tells the server that the browser is able to decode and display the avif format. You could then configure the server to respond to every image request with a avif version , regardless of what the browser actually requested. So lovely-kangaroo.avif is sent when the source code specified lovely-kangaroo.png, and every image request for delightful-halibut.jpg gets delightful-halibut.avif instead, without touching any of the site’s HTML.

The User Agent string

Another field sent in request headers is the User Agent (UA) string. This is a long, freeform string with a sad and sordid history .

The User Agent header is supposed to let a server know which browser and operating system is accessing it. Why? Because in the early days of the web, browsers were rubbish, so the User Agent was defined:

  • For statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations .

That it is freeform presents problems. For example, I worked for the browser vendor, Opera. Opera was the first browser to reach version 10, but when that happened, badly written User Agent detection routines couldn’t deal with double figures, so assumed the visitors’ browser was version 1. At the time, we wrote :

It appears that a considerable amount of browser sniffing scripts are not quite ready for this change to double digits, as they detect only the first digit of the user agent string: in such a scenario, Opera 10 is interpreted as Opera 1. This results in sites mistakenly identifying Opera 10 as an unsupported browser, thereby breaking server, as well as client-side scripts. So, after a few months of careful analysis of the impact this Y2K-like versioning problem may have on site compatibility, we’ve decided to freeze the first part of the string as Opera/9.80 for now, and add the version number in the end — hence the Version/10.00 appendix. This construct allows us to somehow serve the real version number, while sidestepping the various double digit-related issues described above. Opera

A properly written User Agent detection program, such as that from 51Degrees can parse and give correct information:

51Degrees User Agent Tester
51Degrees User Agent Tester

However, many User Agent detection sites and services give incorrect or useless information. This site believes that Opera desktop uses WebKit version 537.36 (it doesn’t; it uses Blink):

Browser Information
Browser Information

This service believes I’m using Opera version 0:

System Analysis
System Analysis

Why would you want to use the User Agent string? Well, like the Iron Throne from Game of Thrones, the User Agent string can be used for good, or for evil.

What you shouldn’t do is use it to deny access to your site or remove a feature because browser X doesn’t support it. This misuse of the UA string is why it is so cumbersome now; every browser has the word “Mozilla” in there, or some sites would appear broken because they were designed when only Mozilla supported frames.

But there are many legitimate uses of device detection using the UA string, too: analytics software that help you understand your customers better; sending smaller videos to devices with smaller screens to save bandwidth (HTML can do responsive images, but not responsive video); nudge an iPhone user to install your Progressive Web App because iOS lags behind Chrome and Firefox on Android.

Companies like 51Degrees can match the UA string and other header information to one of 1,538,454 device combinations in their cloud-based database which is updated daily.

However, it is likely that the User Agent string will be frozen, at least in Chromium-based browsers.

It will still be there, because millions of websites would break if the User Agent string was simply removed. It won’t change and may be replaced in time with something generic. Apple already experimented with this (but had to revert).

When this will happen, we don’t know. The announcement was in January 2020, with a target of September 2020 to “unify desktop OS string as a common value for desktop browsers” and “unify mobile OS/device strings as a similarly common value”.

On 19 May, Google announced “no User-Agent string changes will be coming to the stable channel of Chrome in 2021” and a plan "to roll out these changes slowly and incrementally in 7 Phases—pending Origin Trial feedback — and plan to publish an update soon on the proposed timing and milestones beyond Phase 1.”

Fingerprinting

The stated aim of removing the UA string is to safeguard the web visitor’s privacy: “The User-Agent string is an abundant source of passive fingerprinting information about our users. It contains many details about the user’s browser and device. The above abuse makes it desirable to freeze the UA string and replace it with a better mechanism.”

Fingerprinting is the practice of gathering pieces of data about your browser and device, none of which is unique in itself, but when aggregated together may very well be unique. This is similar to the study that shows 87% of the US population are uniquely identified by {date of birth, gender, ZIP}

Mozilla has a good description of why fingerprinting is problematic:

  • Fingerprinting is a type of online tracking that’s more invasive than ordinary cookie-based tracking. A digital fingerprint is created when a company makes a unique profile of you based on your computer hardware, software, add-ons, and even preferences. Your settings like the screen you use, the fonts installed on your computer, and even your choice of a web browser can all be used to create a fingerprint.
  • If you have a commonly used laptop, PC or smartphone, it may be harder to uniquely identify your device through fingerprinting. However, the more unique add-ons, fonts, and settings you have, the easier you’ll be likely to find. Companies can use this unique combination of information to create your fingerprint.

My first visit to amiunique.org using Chrome (the most widespread browser available) shows me that my fingerprint is “unique among the 3259895 fingerprints in our entire dataset.”

My Browser Fingerprint
My Browser Fingerprint

This is only fingerprinting me from eight different data points. The long running Browser Fingerprinting study at Friedrich-Alexander University Erlangen-Nürnberg, Germany, collects hundreds of attributes from headers, and using JavaScript.

User Agent Client Hints: “a better mechanism”?

In order to give sites some idea about the browsers visiting them, the Chrome team came up with an extension to the Client Hints that we met above. These are called User Agent Client Hints and were shipped in Chrome and Chromium-derived browsers on 11 February 2021.

Now, when a Chrome desktop goes to an HTTPS site, it will send the following new User Agent Client Hints:

Sec-ch-ua-mobile ?0

Unlike the current UA string, this is structured data. It’s a key-value pair with Boolean data (?0 = false, ?1=true). So, this hint tells the server that this is not a mobile browser. (What is the definition of ‘mobile’? There isn’t one. A mobile browser is not a desktop browser. What is a ‘desktop browser’? Not a mobile browser.)

Sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"

The first bits of this are pretty self-explanatory; this is Google Chrome version 89, based on Chromium 89. But what’s that weird ;Not A Brand";v="99"

There’s a word for that. GREASE is the word. It's got groove, but it's got no meaning (by design). The Client Hints specification explains:

  • User agents' brands containing more than a single entry could encourage standardized processing of the brands list. By randomly including additional, intentionally incorrect, comma-separated entries with arbitrary ordering, they would reduce the chance that we ossify on a few required strings.

Note that in this example (“;Not A Brand";v="99"”) the semicolon is included in the name of the ""brand”; this is to catch people who try to parse the CH UA strings by splitting on semi-colons without caring about proper parsing.

The idea is borrowed from IETF:

  • GREASE (Generate Random Extensions And Sustain Extensibility), a mechanism to prevent extensibility failures in the TLS ecosystem. It reserves a set of TLS protocol values that may be advertised to ensure peers correctly handle unknown values.
  • TLS follows a model where one side, usually the client, advertises capabilities and the peer, usually the server, selects them. The responding side must ignore unknown values so that new capabilities may be introduced to the ecosystem while maintaining Interoperability.
  • However, bugs may cause an implementation to reject unknown values. It will interoperate with existing peers, so the mistake may spread through the ecosystem unnoticed. Later, when new values are defined, updated peers will discover that the metaphorical joint in the protocol has rusted shut and that the new values cannot be deployed without interoperability failures.
  • To avoid this problem, this document reserves some currently unused values for TLS implementations to advertise at random. Correctly implemented peers will ignore these values and interoperate. Peers that do not tolerate unknown values will fail to interoperate, revealing the mistake before it is widespread.

Therefore, “user agents MUST include more than a single value in brands, where one of these values is an arbitrary value. The value order in brands MUST change over time to prevent receivers of the header from relying on certain values being in certain locations in the list.”

So far, so good. We now get the browser name and version number, in a structured format in a way that is at least designed not to break when new browsers come to market, and we can tell whether the browser prefers “mobile” or non-mobile content. Or, at least, we can if the site is served over HTTPS. If your site is served over HTTP, you’ll only receive the UA string which may at some point in the future be frozen and provide you with little or no useful data. But in 2021, there’s no real excuse for serving insecure sites.

Please, may I have some more?

If a site needs more information, it must indicate it with a header to the client, as with the other Client Hints we met before. From the specification:

A user agent's initial request to https://example.com will include the following request headers:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.1.2222.33 Safari/537.36 Sec-CH-UA: "Chrome"; v="74", ";Not)Your=Browser"; v="13" Sec-CH-Mobile: ?0

If a server delivers the following response header:

Accept-CH: Sec-CH-UA-Full-Version, Sec-CH-UA-Platform, Sec-CH-UA-Arch

Then subsequent requests to https://example.com will include the following request headers:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.1.2222.33 Safari/537.36 Sec-CH-UA: "Chrome"; v="74", ";Not)Your=Browser"; v="13" Sec-CH-UA-Full-Version: "74.0.3424.124" Sec-CH-UA-Platform: "macOS" Sec-CH-UA-Arch: "arm"

The secure Client Hint headers will be sent until the server sends an Accept-CH response header for a different set of Client Hints, or until the browser session is terminated.

These are the extra Client Hints that the User Agent may send if it decides to send the requested hints:

  • full version - The User Agent's build version (for example: "72.0.3245.12", "3.14159", or "297.70E04154A").
  • platform brand - The User Agent's operating system’s commercial name (for example: "Windows", "iOS", or "AmazingOS").
  • platform version - The User Agent's operating system’s version (for example: "NT 6.0", "15", or "17G").
  • platform architecture - The User Agent's underlying CPU architecture (for example: "ARM", or "x86").
  • model - The User Agent's device model (for example: "", or "Pixel 2 XL").

There’s no guarantee that all of this information will be accurate. The User Agent Client Hint specification says:

User agents ought to exercise judgement before granting access to this information, and MAY impose restrictions above and beyond the secure transport and delegation requirements noted above. For instance, user agents could choose to reveal platform architecture only on requests it intends to download, giving the server the opportunity to serve the right binary. Likewise, they could offer users control over the values revealed to servers, or gate access on explicit user interaction via a permission prompt or via a settings interface. User Agent Client Hints specification

These user agent identifiers are considered to be "high-entropy" and need to be specifically requested because they're the most useful for fingerprinting.

The JavaScript APIs associated with “high-entropy portions of the user agent information” (that is, those that have to be requested by the server) are retrieved through a Promise, in order to give User Agents the opportunity to gate their exposure behind potentially time-consuming checks (e.g. by asking the user for their permission).

Neither is there any guarantee that any information sent will be especially useful. The specification also says:

User agents SHOULD keep these strings short and to the point, but servers MUST accept arbitrary values for each, as they are all values constructed at the user agent's whim. User Agent Client Hints specification

Not every device manufacturer uses consistent naming conventions to adequately distinguish between devices. Even the iPhone only announces that it’s an iPhone in the current User Agent string, although there is great variety across the range of models.

It would be much more useful if manufacturers put the Unique Product Code in the field, but that would rely on their accurately doing so, and would probably be deemed too fingerprintable. (It’s also worth reminding ourselves here that the User Agent string contains many lies for legacy reasons. It’s the best example of historical fiction since Tolstoy wrote War and Peace, and only marginally shorter.)

Perhaps the biggest technical change if the UA string is frozen and replaced by User Agent Client Hints is that less information is available on the first visit to a domain.

One interested party wrote:

As a small ad network, we use ip and user agent data to combat ad fraud. These same bits of entropy are used when we detect ad fraud. This is virtually impossible to do if all we are getting is a rotating VPN-based IP address and "Chrome 74". At best, we have to wait for another request to get the rest of the UA data (significantly reducing our ad serving speed). TL;DR: We need the full OS and browser version to survive as a small ad network. And more importantly, we need this data as part of every first http request, without being discriminated against for being a less frequently visited / smaller website. One interested party

Which browsers and when?

As we’ve noted, although the move to freeze the UA strong was originally Apple’s, the impetus to replace it with User Agent Client Hints is entirely Google.

Apple said:

The recent thinking about client hints in general is that it adds new fingerprinting surface to tracking pixels. Those 1px images typically don't get to run JS so the amount of information it can gather is limited. With this feature, it can gather extra bit of information, and a lot more with other client hints. For these things, client hints including but not limited to Content-DPR header is not something Apple's WebKit is interested in implementing at this point in time. Apple

Mozilla says:

We're very interested in the freezing UA string stuff and NavigatorUAData JS APIs, and think those are worth prototyping. Personally, the GREASE-like stuff is interesting but slightly scary … But I don't think the positive points change our position on Client Hints (which is currently "non-harmful"). Mozilla

It’s good to have structured access to User Agent data for those that need to.

51Degrees wrote:

From an engineering perspective, the User-Agent string is inefficient as it contains many characters and is unstructured. Client Hints could remove much of the complexity associated with identifying the browser, operating system, and device. Client Hints do not remove the need to maintain an accurate catalog of the associated device, browser, and operating system data. 51Degrees

It’s also worth wondering whether Google’s motives are entirely altruistic here, or whether it is trying to make life a little harder for its adtech rivals; Google is, after all, one of the largest advertising firms on the planet , that just happens to run a search engine, an excellent browser, and the most widespread operating system in the world. (I don’t mean to cast aspersions on my many friends who work for Google, but humanity already invented an artificial intelligence indifferent to human well-being: it's the Corporation.)

Although User Agent Client Hints are already shipped in Chrome, the “User Agent Client Hints specification” was last updated after shipping, and still reads “Unofficial Draft”. (Edit: 7 September 2021. The specification has since been updated, but it still remains as an "Unofficial Draft". It's promising to see they are keeping it updated, but when will it become official?) At time of writing, there are more than 30 open issues, and unanswered questions in the specification. Few software vendors want to implement against a moving target, whether they be browser vendors or service suppliers.

It feels to me like it’s too early to know how this will play out. The problems that User Agent Client Hints is trying to solve are the structure of the User Agent string and reducing the passive fingerprinting surface of the web. Eliminating the User Agent string is one method of solving the problem, but it is not a problem statement. This may seem pedantic, but it’s very important to separate the problem from the solution, and not conflate the two.

Will Google freeze the UA string? Will other browsers follow suit? Will this change attract the wrath of regulators if it disadvantages Google’s small adtech competitors?

We’ll keep you updated.

Further reading

Now that we have learned why Client Hints were conceived, let's see how 51Degrees' have upgraded their real-time data services to accomodate the industry change. You may also wish to explore how they work, using live examples to see what information is returned.

  1. We delve deeper into your frequently asked questions with our blog User Agent Client Hints - Your questions answered.
  2. Test User Agent Client Hints for yourself with our Client Hints page.
  3. Support for UA-CH was made available in version 4.3 of our API packages.
  4. If you're hungry for more, we have multiple blogs on the topic of Client Hints.