PDF scripts

2025-02-26 · 2min · Dmitry Scherbakov

Table of Contents

Optimize PDFs

PDF "optimizations" (for example: stripping unnecessary data, rendering images in lower quality, when better quality is not needed, etc.) can be done with ghostscript. Here is the command you can use:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dPDFSETTINGS=/screen -sOutputFile=out.pdf in.pdf

Arguments:

-q. Run in quiet mode. Without this flag, ghostscript executable will produce a lot of completely useless information. However, this flag can be used for debugging.
-dNOPAUSE. By default, ghostscript will prompt you before processing each page. Useful when dealing with large files containing a lot of pages.
dBATCH. Quit ghostscript after processing last file. Running ghostscript without this flag will make it open interactive "console" after processing.
-sDEVICE=pdfwrite. Specify PDF device.
-dCompatibilityLevel=1.3. Specify PDF compatibility level. If you are having issues with 1.3, you can try other variants such as 1.5, 1.4, 1.2 or 1.1.

-dPDFSETTINGS=/screen. Specify "optimization" level. Here are supported values (source):

Value	Description
screen	Screen-view-only quality, 72 dpi images
ebook	Low quality, 150 dpi images
printer	High quality, 300 dpi images
prepress	High quality, color preserving, 300 dpi images
default	Almost identical to /screen

-sOutputFile=out.pdf. Specify path to the output file. Since ghostscript executable actually accepts multiple input files, you must provide this argument before specifying input files.
in.pdf. Input file.

Extracing pages

Page extraction can be implemented with Python3:

Dependencies:
```
pip install 'PyPDF2>=3.0.0'
```

The script (only function):

from typing import Iterable
from PyPDF2 import PdfReader, PdfWriter


def extract_pages(src: str, dest: str, pages: Iterable[int]) -> None:
    """Exract pages from `src` file and write them to `dest` file.

    :param src: Path to the input file.
    :param dest: Path to the output file.
    :param pages: Numbers (not indices!) of pages to extract.
        The first page (with index 0) has number 1.
    """
    reader, writer = PdfReader(src), PdfWriter()

    for number in pages:
        writer.add_page(reader.pages[number - 1])

    with open(dest, "wb") as file:
        writer.write(file)