PDF scripts
·
2min
·
Dmitry Scherbakov
Table of Contents
Optimize PDFs
PDF "optimizations" (for example: stripping unnecessary data, rendering images in lower quality, when better quality is not needed, etc.) can be done with ghostscript. Here is the command you can use:
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dPDFSETTINGS=/screen -sOutputFile=out.pdf in.pdf
Arguments:
-q
. Run in quiet mode. Without this flag, ghostscript executable will produce a lot of completely useless information. However, this flag can be used for debugging.-dNOPAUSE
. By default,ghostscript
will prompt you before processing each page. Useful when dealing with large files containing a lot of pages.dBATCH
. Quit ghostscript after processing last file. Running ghostscript without this flag will make it open interactive "console" after processing.-sDEVICE=pdfwrite
. Specify PDF device.-dCompatibilityLevel=1.3
. Specify PDF compatibility level. If you are having issues with1.3
, you can try other variants such as1.5
,1.4
,1.2
or1.1
.-dPDFSETTINGS=/screen
. Specify "optimization" level. Here are supported values (source):Value Description screen Screen-view-only quality, 72 dpi images ebook Low quality, 150 dpi images printer High quality, 300 dpi images prepress High quality, color preserving, 300 dpi images default Almost identical to /screen -sOutputFile=out.pdf
. Specify path to the output file. Since ghostscript executable actually accepts multiple input files, you must provide this argument before specifying input files.in.pdf
. Input file.
Extracing pages
Page extraction can be implemented with Python3:
- Dependencies:
pip install 'PyPDF2>=3.0.0'
- The script (only function):
from typing import Iterable from PyPDF2 import PdfReader, PdfWriter def extract_pages(src: str, dest: str, pages: Iterable[int]) -> None: """Exract pages from `src` file and write them to `dest` file. :param src: Path to the input file. :param dest: Path to the output file. :param pages: Numbers (not indices!) of pages to extract. The first page (with index 0) has number 1. """ reader, writer = PdfReader(src), PdfWriter() for number in pages: writer.add_page(reader.pages[number - 1]) with open(dest, "wb") as file: writer.write(file)