How to set User-Agent header with Puppeteer JS and not fail

Don’t just blindly change your User-Agent header.

When scraping, we want to do a few things. Scrape slowly (but fast enough), use proxy, rotate ip address and rotate User-Agent header. With this, we may go unnoticed.

User-Agent header is one of the most abused headers by those who are scraping. And it’s all because of UA sniffing.

It’s all great if we just fetch HTML and parse it with cheerio. There is no one to check us. Now, we need to scrape a website with JS enabled, because website is doing black magic client side rendering.

We setup Puppeteer JS, open the browser, load webpage and get the data. Everything looks good. At some point, we decided to rotate user agent. Why not? It’s simple and Puppeteer have a great API for that:

Most of the time this will be enough, but sometimes websites will try to check if you are doing something you shouldn’t.

Let’s see how websites can check if we are faking User-Agent header and how we can bypass some checks.

Navigator object

A website can check our navigator object. Navigator properties are so fucked up, but here they are:

  • platform: MacIntel, Win32,Linux x86_x64,iPhone, Linux armv8l
  • productSub:
    · Safari, Chrome, Opera, Edge … gives 20030107
    · Firefox gives 20100101
  • vendor:
    · Chrome, Opera … gives Google Inc.
    · Safari gives Apple Computer, Inc.
    · Firefox, Internet Explorer, Edge … gives empty string
  • oscpu: only Firefox
  • cpuClass: only Internet Explorer

We don’t need to start Puppeteer. We can open devtools and go to “Network conditions” tab. There, set the User-Agent to Internet Explorer 11.
Try it on i-know-you-faked-user-agent.glitch.me

There are a lot of devices and a lot of User-Agent strings. But if we have an iPhone in User-Agent and Linux as our platform property, we are busted.

If a website is checking navigator object, we need to override those properties.

Fake Firefox / Windows
Fake Internet Explorer / Windows

Window.open

A website can open a new window and get navigator.userAgent from that new window.

You can check this on glitch.

Waiting for 'targetcreated' or 'popup' event to get a new page is slow.
It’s just not enough. We can’t bypass it like that.

We need to be creative. If a website is using window.open(), we must change the open function.

Website can check if we are messing around with window.open() function.
In order to cover ourselves, we must add:

window.open.toString = () => 'function open() { [native code] }'

Service Worker

One thing we can’t fake is communication with a service worker. After service worker registration, main thread can communicate with a worker via postMessage.

main.js
service-worker.js

Service Worker will return the original User-Agent header and it’s game over.

The only way we can bypass this is to start a browser with a predefined User-Agent header. With that setup, we are stuck with that one User-Agent.

Conclusion

User-Agent sniffing is one small piece of browser fingerprinting puzzle. Sometimes you have to make an extra effort and don’t just blindly fake your User-Agent header. Make sure that website is not double checking you.

This is for educational purpose, use Puppeteer responsibly.

Thanks for reading! If you like the article, give it a clap, or two, or 50👏.
Leave a comment below if you have any questions or say hi on
Twitter.

Coffee Driven Software Developer @SBGenomics