PDF scripts

· 2min · Dmitry Scherbakov
Table of Contents

Optimize PDFs

PDF "optimizations" (for example: stripping unnecessary data, rendering images in lower quality, when better quality is not needed, etc.) can be done with ghostscript. Here is the command you can use:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dPDFSETTINGS=/screen -sOutputFile=out.pdf in.pdf

Arguments:

  • -q. Run in quiet mode. Without this flag, ghostscript executable will produce a lot of completely useless information. However, this flag can be used for debugging.
  • -dNOPAUSE. By default, ghostscript will prompt you before processing each page. Useful when dealing with large files containing a lot of pages.
  • dBATCH. Quit ghostscript after processing last file. Running ghostscript without this flag will make it open interactive "console" after processing.
  • -sDEVICE=pdfwrite. Specify PDF device.
  • -dCompatibilityLevel=1.3. Specify PDF compatibility level. If you are having issues with 1.3, you can try other variants such as 1.5, 1.4, 1.2 or 1.1.
  • -dPDFSETTINGS=/screen. Specify "optimization" level. Here are supported values (source):
    ValueDescription
    screenScreen-view-only quality, 72 dpi images
    ebookLow quality, 150 dpi images
    printerHigh quality, 300 dpi images
    prepressHigh quality, color preserving, 300 dpi images
    defaultAlmost identical to /screen
  • -sOutputFile=out.pdf. Specify path to the output file. Since ghostscript executable actually accepts multiple input files, you must provide this argument before specifying input files.
  • in.pdf. Input file.

Extracing pages

Page extraction can be implemented with Python3:

  • Dependencies:
    pip install 'PyPDF2>=3.0.0'
    
  • The script (only function):
    from typing import Iterable
    from PyPDF2 import PdfReader, PdfWriter
    
    
    def extract_pages(src: str, dest: str, pages: Iterable[int]) -> None:
        """Exract pages from `src` file and write them to `dest` file.
    
        :param src: Path to the input file.
        :param dest: Path to the output file.
        :param pages: Numbers (not indices!) of pages to extract.
            The first page (with index 0) has number 1.
        """
        reader, writer = PdfReader(src), PdfWriter()
    
        for number in pages:
            writer.add_page(reader.pages[number - 1])
    
        with open(dest, "wb") as file:
            writer.write(file)