X

Puppeteer: an API to programmatically control Chrome


Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium through the DevTools protocol in a Headless way; which means that we interact with Chrome without the graphical interface, that is, without a browser as such. Puppeteer is available in Javascript, Python and as a web service.

Some functionalities:

For Javascript, the library is supported since node 6 version. When you download Puppeteer from npm, Chromium is automatically downloaded. Chromium is an open-source browser, on which Chrome is based, but with additional features.

One of Puppeteer’s most interesting features is being able to pre-render and help run server-side rendering (SSR). The goal of pre-rendering is to take a client-side single page web page and convert it to a static version.

To complete the SSR process, after Puppeteer returns the static page, the result is sent to a Web server that is responsible for rendering it. This can help SEO as well as provide metadata to social media channels.

Some optimizations of the pre-rendering with Puppeteer are:

Another functionality is the ability to analyze what features of your website are not available for Google Bot search. This is a warning sign to identify what will not render correctly in Google Search.

On its side, Puppeteer can be used for web scraping. This is a technique that is used to extract information from a website for different purposes, such as obtaining data from people, emails, phone numbers, bulk information from a specific guild to do marketing, among others.

It should be noted that through Puppeteer you can automate tasks that could require a lot of effort and time manually. For example, one of my tasks in the maintenance area is to periodically update customer sites, then it is checked that the styles and design remain intact, without any visual inconsistency.

In order to avoid visits to the sites and reduce the time invested in this process, a script was implemented to automate it:

  1. Enter to n internal links on a page.
  2. Take a screenshot for each link.
  3. Run the respective update.
  4. Capture again and compare before and after the update.

This script throws a warning when a site has changed, only in that case I am going to check it.

These are just some notes of the Puppeteer functionalities, an interesting tool that you can start using in your work environment since it allows you to save time on various tasks. If you have a manual process involving the browser, you can take a look at Puppeteer and fix it.


Tags: Automatizaciónjavascriptweb scraping