Tianjian Qin

Convert HTML to PDF with Node.js and Puppeteer

Published on 15 Feb 20245 min readLink
image
Image Credit: leonardo.ai

Converting an HTML webpage to a PDF document can be incredibly useful in various scenarios. While many tools are available for this task, they often fall short by not producing a true PDF with selectable elements and failing to respect the webpage's CSS styles, leading to incorrect or unattractive outputs.

pdfgen.js generates high-quality PDFs efficiently and directly from a URL, offering a range of customizable arguments to tailor the PDF to your need.

Below, I provide an overview of the script, its features, and how to use it.

Setting Up Puppeteer and Command Line Arguments

First, we need to import Puppeteer and set up command line argument parsing with yargs. Puppeteer is used to control a headless Chrome or Chromium browser, and yargs helps in parsing command line arguments.

const puppeteer = require('puppeteer');
const yargs = require('yargs/yargs');
const { hideBin } = require('yargs/helpers');

Defining Standard Page Sizes

Next, we define standard page sizes in inches. These sizes will be used to set the dimensions of the PDF pages.

const pageSizes = {
    A4: { width: 8.27, height: 11.69 },
    Letter: { width: 8.5, height: 11 },
    Legal: { width: 8.5, height: 14 },
    Tabloid: { width: 11, height: 17 },
    Executive: { width: 7.25, height: 10.5 },
    A5: { width: 5.83, height: 8.27 },
    A3: { width: 11.69, height: 16.54 }
};

Configuring Command Line Arguments

We use yargs to configure the command line arguments for our script. These arguments allow the user to specify various parameters such as the URL, output file path, DPI, page size, and margins.

const argv = yargs(hideBin(process.argv))
    .usage('Usage: $0 --url <string> --output <string> [--dpi <number>] [--scale <number>] [--pageSize <string> | --width <number> --height <number>] [--top <number>] [--right <number>] [--bottom <number>] [--left <number>] [--no-background] [--no-margin]')
    .demandOption(['url', 'output'])
    .describe('dpi', 'The DPI (dots per inch) for the PDF')
    .describe('scale', 'The scale factor for the PDF')
    .describe('url', 'The URL of the HTML file to convert to PDF')
    .describe('output', 'The file path to write the generated PDF')
    .describe('pageSize', 'The standard page size (A4, Letter, Legal, Tabloid, Executive, A5, A3)')
    .describe('width', 'The custom width for the PDF (in inches)')
    .describe('height', 'The custom height for the PDF (in inches)')
    .describe('top', 'Top margin in mm')
    .describe('right', 'Right margin in mm')
    .describe('bottom', 'Bottom margin in mm')
    .describe('left', 'Left margin in mm')
    .describe('no-background', 'Disable printing background (default: false)')
    .describe('no-margin', 'Set all margins to 0 (conflicts with individual margin settings)')
    .default('dpi', 300)
    .default('scale', 1.0)
    .default('pageSize', 'A4')
    .default('top', 10)
    .default('right', 10)
    .default('bottom', 10)
    .default('left', 10)
    .boolean('no-background')
    .boolean('no-margin')
    .conflicts('no-margin', ['top', 'right', 'bottom', 'left'])
    .help('h')
    .alias('h', 'help')
    .argv;

const dpi = parseInt(argv.dpi, 10);
const scale = parseFloat(argv.scale);
const url = argv.url;
const outputPath = argv.output;
const noBackground = argv['no-background'];
const noMargin = argv['no-margin'];

let width, height;

if (argv.pageSize) {
    const pageSize = pageSizes[argv.pageSize];
    if (pageSize) {
        width = pageSize.width;
        height = pageSize.height;
    } else {
        console.error('Invalid page size. Valid options are: A4, Letter, Legal, Tabloid, Executive, A5, A3.');
        process.exit(1);
    }
} else if (argv.width && argv.height) {
    width = parseFloat(argv.width);
    height = parseFloat(argv.height);
} else {
    console.error('Please provide either a standard page size or custom width and height.');
    process.exit(1);
}

const widthPx = Math.round(width * dpi);
const heightPx = Math.round(height * dpi);

let top, right, bottom, left;
if (noMargin) {
    top = right = bottom = left = 0;
} else {
    top = parseFloat(argv.top);
    right = parseFloat(argv.right);
    bottom = parseFloat(argv.bottom);
    left = parseFloat(argv.left);
}

const topInches = top / 25.4;
const rightInches = right / 25.4;
const bottomInches = bottom / 25.4;
const leftInches = left / 25.4;

Generating the PDF

Now, we create an async function to generate the PDF using Puppeteer. This function launches a headless browser, navigates to the specified URL, sets the viewport, and generates the PDF with the specified settings.

async function generatePDF(url, outputPath) {
    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle0', timeout: 60000 });

    await page.setViewport({
        width: widthPx,
        height: heightPx,
        deviceScaleFactor: 1,
    });

    await page.pdf({
        path: outputPath,
        width: `${widthPx}px`,
        height: `${heightPx}px`,
        printBackground: !noBackground,
        margin: {
            top: topInches,
            right: rightInches,
            bottom: bottomInches,
            left: leftInches
        },
        scale: scale
    });

    await browser.close();
}

generatePDF(url, outputPath)
    .then(() => console.log('PDF generated successfully'))
    .catch(err => console.error('Error generating PDF:', err));

Flexibility

This script allows you to customize various parameters such as:

  • DPI: The dots per inch for the PDF, with a default value of 300 DPI.
  • Scale: The scale factor for the PDF, defaulting to 1.0.
  • Page Size: Standard page sizes like A4, Letter, Legal, etc.
  • Custom Dimensions: Specify custom width and height in inches if needed.
  • Margins: Customize top, right, bottom, and left margins in millimeters, or set all margins to zero.
  • Background: Option to disable printing the background.

Getting Started

To use this script, you need to have Node.js and Puppeteer installed. You can find the script and detailed instructions in my GitHub repository, click the image below to open it:.

Example Usage

Here is an example command to generate a PDF:

node pdfgen.js --url https://example.com --output example.pdf --dpi 300 --scale 1.0 --pageSize A4 --top 10 --right 10 --bottom 10 --left 10

This command converts the HTML page at https://example.com to a PDF named example.pdf with 300 DPI, a scale factor of 1.0, A4 page size, and 10mm margins on all sides.

Choose Colour