The right way to flip internet pages into PDFs with Puppeteer and NodeJS

As a web developer, you might want to create a PDF file of a web page to share with your customers, use in presentations, or add a new feature to your web app. Regardless of your reason, Puppeteer, Google’s Node API for headless Chrome and Chromium, makes the task easy for you.

In this tutorial you will learn how to convert web pages to PDF using Puppeteer and Node.js. Let’s start with a brief introduction to what puppeteer is.

What is Puppeteer and why is it great?

In Google’s own words: Puppeteer reads: “A node library that provides a general API for controlling headless Chrome or Chromium via the DevTools protocol.”

[Read: Meet the 4 scale-ups using data to save the planet]

What is a headless browser?

If you are not familiar with the term headless browser, it is simply a browser with no GUI. In that sense, a headless browser is just another browser that understands how HTML web pages are rendered and JavaScript processed. Due to the lack of a GUI, interactions with a headless browser are via a command line.

Although Puppeteer is primarily a headless browser, you can configure and use it as a non-headless Chrome or Chromium.

What can you do with a puppeteer?

Puppeteer’s powerful browser capabilities make it a perfect candidate for web app testing and web scraping.

To name just a few use cases where Puppeteer has the perfect features for web developers:

  • Generate PDFs and screenshots of web pages
  • Automate form submission
  • Scratch websites
  • Run automated UI tests while keeping the test environment up to date.
  • Generation of pre-rendered content for single page applications (SPAs)

Set up the project environment

You can use Puppeteer in the backend and frontend to generate PDFs. In this tutorial, we will use a node backend for the task.

Initialize NPM and set up the usual express server to start the tutorial.

Make sure to install the Puppeteer NPM package with the following command before you begin.

Convert web pages to PDF

Now we come to the exciting part of the tutorial. With Puppeteer, we only need a few lines of code to convert web pages to PDF.

First, create a browser instance with Puppeteer start Function.

Then we create a new page instance and visit the specified page URL with Puppeteer.

We stopped that wait until Option too networkidle0. When we use networkidle0 Option, puppeteer waits until there are no new network connections within the last 500 ms. That way you can tell if the site has loaded completely. It’s not accurate, and Puppeteer offers other options, but it’s one of the most reliable in most cases.

Finally, we create the PDF from the crawled page content and save it on our device.

The pressure on PDF function is pretty complicated and allows a lot of customization which is fantastic. Here are some of the options we used:

  • printBackground: If this option is set to true, Puppeteer will print any background colors or images you used on the webpage as PDF.
  • path: Path indicates where the generated PDF file should be saved. You can also save it to a memory stream to avoid writing to disk.
  • format: You can set the PDF format to one of the following options: Letter, A4, A3, A2, etc.
  • Span: With this option you can set a border for the generated PDF.

When the PDF creation is complete, close the browser connection with browser.close ().

Build an API to generate and respond to PDFs from URLs

With the knowledge we have gathered so far, we can now create a new endpoint that will receive a URL as a query string and then send the generated PDF back to the client.

Here is the code:

When you start the server and visit the / pdf Route, with a target Query parameter with the url to be converted. The server provides the generated PDF directly without ever saving it on the hard drive.

URL example: http: // localhost: 3000 / pdf? target = https: //google.com

This will generate the following PDF as it looks in the image:

Example of a PDF capture

That’s it! You have finished converting a webpage to PDF. Wasn’t that easy?

As mentioned earlier, Puppeteer offers a lot of customization options. So play around with the possibilities to get different results.

Next, we can resize the viewport to capture sites of different resolutions.

Capture websites with different viewports

In the PDF file we created earlier, we did not specify the viewport size for the webpage Puppeteer is visiting, but instead used the standard viewport size of 800 × 600px.

However, we can fine-tune the size of the page’s viewport before we crawl the page.

Conclusion

In today’s tutorial, we used Puppeteer, a node API for headless Chrome, to generate a PDF of a specific webpage. Now that you are familiar with the basics of Puppeteer, you can use this knowledge in the future to create PDFs or even for other purposes such as web scraping and UI testing.

These items was originally published on Live code stream by Juan Cruz Martinez (Twitter: @bajcmartinez), Founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker and maker of things.

Live code stream is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI and computer science in general.

Comments are closed.