How to scrape virtual scrolling with Puppeteer

Filip Vitas
3 min readFeb 21, 2022
Photo by Danilo Alvesd on Unsplash

What is Virtual Scrolling?

Virtual scrolling (windowing) is a technique where we don’t display the entire content on the screen in order to increase performance and reduce the amount of DOM elements. Instead, we render what is in the view and a little bit on the top and bottom. This way we have a nice buffer of the content. It looks like the entire content is rendered. On scrolling, a virtual scroller removes old content and adds new content into the view. We still need to render the full height of the container element to fake the presence of the entire content.

There are libraries for all major frameworks: virtual-window, react-window,
react-virtualized, preact-virtual-list, vue-virtual-scroller,
vue-virtual-scroll-list, vue-virtual-infinite-scroll,
ngx-virtual-scroller, svelte-virtual-list

Scrape it

Now that we know what virtual scrolling is and that virtual scroller adds and removes the data, we have to find a way to scrape that data.
The problem with virtual scrolling is that when some items leave the viewport they will be removed from the DOM. That’s why we can’t wait for every item to load. We need to scrape the data as soon as it enters the DOM.

MutationObserver API provides a very elegant way of detecting changes in the DOM. We can listen to newly added or removed nodes. As soon as a node is added to the DOM we can scrape it.

There are multiple ways to implement scrolling. Since this is not the infinite scroll, we can go slowly to the bottom until we hit the end of the list.

Move data into Nodejs context

All this previous scraping happens in the browser, but we need that data in the Node js context. Since we are handling scraping in the browser event handler function we can only transfer that data via the exposed function.

Thanks for reading! If you like the article, give a clap, or two, or 50👏.
Leave a comment below if you have any questions or say hi on
Twitter.

--

--