algorembrant commited on
Commit
55fbe5b
·
verified ·
1 Parent(s): ded0ea7

Upload 5 files

Browse files
Files changed (5) hide show
  1. .gitignore +77 -0
  2. LICENSE +21 -0
  3. README.md +388 -0
  4. pdf_manipulator.py +1065 -0
  5. requirements.txt +4 -0
.gitignore ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ *.egg
7
+ *.egg-info/
8
+ dist/
9
+ build/
10
+ eggs/
11
+ parts/
12
+ var/
13
+ sdist/
14
+ wheels/
15
+ pip-wheel-metadata/
16
+ share/python-wheels/
17
+ .installed.cfg
18
+ lib/
19
+ lib64/
20
+ MANIFEST
21
+
22
+ # Virtual environments
23
+ .env
24
+ .venv
25
+ env/
26
+ venv/
27
+ ENV/
28
+ env.bak/
29
+ venv.bak/
30
+
31
+ # Distribution / packaging
32
+ .Python
33
+ develop-eggs/
34
+ downloads/
35
+ htmlcov/
36
+ .tox/
37
+ .nox/
38
+ .coverage
39
+ .coverage.*
40
+ .cache
41
+ nosetests.xml
42
+ coverage.xml
43
+ *.cover
44
+ *.py,cover
45
+ .hypothesis/
46
+ .pytest_cache/
47
+
48
+ # IDE
49
+ .idea/
50
+ .vscode/
51
+ *.sublime-project
52
+ *.sublime-workspace
53
+ *.swp
54
+ *.swo
55
+ *~
56
+
57
+ # OS
58
+ .DS_Store
59
+ .DS_Store?
60
+ ._*
61
+ .Spotlight-V100
62
+ .Trashes
63
+ ehthumbs.db
64
+ Thumbs.db
65
+ desktop.ini
66
+
67
+ # Project-specific
68
+ output/
69
+ outputs/
70
+ temp/
71
+ tmp/
72
+ *.log
73
+ test_input/
74
+ test_output/
75
+ sample_pdfs/
76
+ *.bak
77
+ *.orig
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 algorembrant
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,388 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pdf
4
+ - document-processing
5
+ - pdf-manipulation
6
+ - python
7
+ - cli
8
+ - automation
9
+ language:
10
+ - en
11
+ license: mit
12
+ library_name: pdf-manipulator
13
+ pipeline_tag: other
14
+ ---
15
+
16
+ # PDF Manipulator
17
+
18
+ ![Python](https://img.shields.io/badge/Python-3.9%2B-blue?style=flat-square&logo=python)
19
+ ![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)
20
+ ![Version](https://img.shields.io/badge/Version-1.0.0-orange?style=flat-square)
21
+ ![Maintained](https://img.shields.io/badge/Maintained-Yes-brightgreen?style=flat-square)
22
+ ![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-blueviolet?style=flat-square)
23
+ ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey?style=flat-square)
24
+
25
+ A comprehensive, single-file command-line toolkit for all PDF page manipulation operations. Merge, split, remove, rotate, crop, watermark, encrypt, number, reorder, and batch-process PDF files with a clean and intuitive CLI.
26
+
27
+ **Author:** algorembrant
28
+
29
+ ---
30
+
31
+ ## Features
32
+
33
+ | Feature | Command | Description |
34
+ |----------------------|------------------|-----------------------------------------------------|
35
+ | Merge PDFs | `merge` | Combine multiple PDFs into one, or interleave pages |
36
+ | Split PDF | `split` | Split into individual pages or page ranges |
37
+ | Remove Pages | `remove` | Remove one or more pages by number or range |
38
+ | Extract Pages | `extract` | Extract specific pages into a new PDF |
39
+ | Reorder Pages | `reorder` | Rearrange pages in any custom order |
40
+ | Rotate Pages | `rotate` | Rotate pages by 90, 180, or 270 degrees |
41
+ | Reverse Pages | `reverse` | Reverse the page order |
42
+ | Duplicate Pages | `duplicate` | Duplicate specific pages N times |
43
+ | Insert Blank Page | `insert-blank` | Insert blank page before or after a position |
44
+ | Insert PDF Pages | `insert` | Insert pages from another PDF at a position |
45
+ | Replace Pages | `replace` | Replace pages with pages from another PDF |
46
+ | Crop Pages | `crop` | Crop pages to a custom bounding box |
47
+ | Scale / Resize | `scale` | Scale pages by factor or resize to A4/letter |
48
+ | Watermark | `watermark` | Add text or PDF watermark to all pages |
49
+ | Stamp / Overlay | `stamp` | Overlay a stamp PDF on pages |
50
+ | Page Numbers | `number` | Add page numbers at any position |
51
+ | Encrypt | `encrypt` | Password-protect a PDF |
52
+ | Decrypt | `decrypt` | Remove password from a PDF |
53
+ | Metadata | `metadata` | View or edit PDF title, author, subject, keywords |
54
+ | Bookmarks | `bookmarks` | List or add bookmark/outline entries |
55
+ | Extract Text | `text` | Extract plain text from pages |
56
+ | Info / Inspect | `info` | Display page count, dimensions, and metadata |
57
+ | N-Up Layout | `nup` | Arrange multiple pages per sheet (2x1, 2x2, etc.) |
58
+ | Compress | `compress` | Losslessly compress PDF streams |
59
+ | Batch Remove | `batch-remove` | Remove pages from all PDFs in a directory |
60
+ | Batch Merge | `batch-merge` | Merge all PDFs in a directory into one |
61
+ | Batch Split | `batch-split` | Split all PDFs in a directory into pages |
62
+
63
+ ---
64
+
65
+ ## Requirements
66
+
67
+ - Python 3.9 or newer
68
+ - System dependency: **Poppler** (required for `nup` command only)
69
+
70
+ ---
71
+
72
+ ## Installation
73
+
74
+ ### 1. Clone the Repository
75
+
76
+ ```bash
77
+ git clone https://github.com/algorembrant/pdf-manipulator.git
78
+ cd pdf-manipulator
79
+ ```
80
+
81
+ ### 2. Install Python Dependencies
82
+
83
+ ```bash
84
+ pip install -r requirements.txt
85
+ ```
86
+
87
+ ### 3. Install Poppler (Required for N-Up Layout)
88
+
89
+ The `nup` command uses `pdf2image`, which requires Poppler to be installed on your system.
90
+
91
+ | Platform | Install Command |
92
+ |--------------|-------------------------------------------------------|
93
+ | Ubuntu/Debian| `sudo apt-get install -y poppler-utils` |
94
+ | macOS | `brew install poppler` |
95
+ | Windows | Download from https://github.com/oschwartz10612/poppler-windows/releases and add `bin/` to your PATH |
96
+
97
+ If you do not need the `nup` command, Poppler is not required.
98
+
99
+ ---
100
+
101
+ ## Usage
102
+
103
+ ### Page Range Syntax
104
+
105
+ | Syntax | Meaning |
106
+ |----------|-------------------------------------|
107
+ | `3` | Page 3 only |
108
+ | `1,3,5` | Pages 1, 3, and 5 |
109
+ | `2-5` | Pages 2 through 5 inclusive |
110
+ | `1,3-5,7`| Pages 1, 3, 4, 5, and 7 |
111
+
112
+ Pages are always 1-indexed (first page = 1).
113
+
114
+ ---
115
+
116
+ ### Step-by-Step Guide
117
+
118
+ #### Merge PDFs
119
+
120
+ ```bash
121
+ # Merge two or more PDFs in order
122
+ python pdf_manipulator.py merge -i file1.pdf file2.pdf file3.pdf -o merged.pdf
123
+
124
+ # Interleave pages (page 1 from file1, page 1 from file2, page 2 from file1, ...)
125
+ python pdf_manipulator.py merge -i file1.pdf file2.pdf -o interleaved.pdf --interleave
126
+ ```
127
+
128
+ #### Split PDF
129
+
130
+ ```bash
131
+ # Split into individual pages (saved to a directory)
132
+ python pdf_manipulator.py split -i input.pdf -o ./split_pages
133
+
134
+ # Extract a range of pages into a single file
135
+ python pdf_manipulator.py split -i input.pdf -o ./split_pages --range 1-5
136
+ python pdf_manipulator.py split -i input.pdf -o ./split_pages --range 2,4,6
137
+ ```
138
+
139
+ #### Remove Pages
140
+
141
+ ```bash
142
+ # Remove page 3
143
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 3
144
+
145
+ # Remove multiple pages
146
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3,5
147
+
148
+ # Remove a range of pages
149
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 2-5
150
+
151
+ # Remove mixed selection
152
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3-5,7
153
+ ```
154
+
155
+ #### Extract Pages
156
+
157
+ ```bash
158
+ # Extract pages 1-3 into a new PDF
159
+ python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 1-3
160
+
161
+ # Extract specific pages
162
+ python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 2,4,6
163
+ ```
164
+
165
+ #### Reorder Pages
166
+
167
+ ```bash
168
+ # Place page 3 first, then page 1, then page 2, then page 4
169
+ python pdf_manipulator.py reorder -i input.pdf -o output.pdf --order 3,1,2,4
170
+ ```
171
+
172
+ #### Rotate Pages
173
+
174
+ ```bash
175
+ # Rotate all pages 90 degrees clockwise
176
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 90
177
+
178
+ # Rotate only pages 1 and 3 by 180 degrees
179
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 180 --pages 1,3
180
+
181
+ # Rotate pages 2-4 by 270 degrees
182
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 270 --pages 2-4
183
+ ```
184
+
185
+ #### Reverse Page Order
186
+
187
+ ```bash
188
+ python pdf_manipulator.py reverse -i input.pdf -o output.pdf
189
+ ```
190
+
191
+ #### Duplicate Pages
192
+
193
+ ```bash
194
+ # Duplicate page 2 so it appears 3 times in a row
195
+ python pdf_manipulator.py duplicate -i input.pdf -o output.pdf --pages 2 --times 3
196
+ ```
197
+
198
+ #### Insert Blank Pages
199
+
200
+ ```bash
201
+ # Insert a blank page after page 2
202
+ python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --after 2
203
+
204
+ # Insert a blank page before page 1
205
+ python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --before 1
206
+ ```
207
+
208
+ #### Insert Pages from Another PDF
209
+
210
+ ```bash
211
+ # Insert all pages from extra.pdf after page 3 of base.pdf
212
+ python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --after 3
213
+
214
+ # Insert before page 2
215
+ python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --before 2
216
+ ```
217
+
218
+ #### Replace Pages
219
+
220
+ ```bash
221
+ # Replace page 2 of base.pdf with page 1 of new.pdf
222
+ python pdf_manipulator.py replace -i base.pdf --replace-file new.pdf -o output.pdf --pages 2 --replace-pages 1
223
+ ```
224
+
225
+ #### Crop Pages
226
+
227
+ ```bash
228
+ # Crop all pages (coordinates in PDF points: left,bottom,right,top)
229
+ python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700"
230
+
231
+ # Crop only pages 1-3
232
+ python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700" --pages 1-3
233
+ ```
234
+
235
+ #### Scale / Resize Pages
236
+
237
+ ```bash
238
+ # Scale all pages to 50% of original size
239
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --factor 0.5
240
+
241
+ # Resize all pages to A4
242
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size A4
243
+
244
+ # Resize to US Letter
245
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size letter
246
+ ```
247
+
248
+ #### Add Watermark
249
+
250
+ ```bash
251
+ # Add a text watermark with defaults (red diagonal, 15% opacity)
252
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "CONFIDENTIAL"
253
+
254
+ # Custom opacity and angle
255
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "DRAFT" --opacity 0.3 --angle 45
256
+
257
+ # Use a PDF file as watermark
258
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --watermark-pdf wm.pdf
259
+ ```
260
+
261
+ #### Stamp / Overlay
262
+
263
+ ```bash
264
+ # Overlay stamp.pdf on all pages
265
+ python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf
266
+
267
+ # Overlay on page 1 only
268
+ python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf --pages 1
269
+ ```
270
+
271
+ #### Add Page Numbers
272
+
273
+ ```bash
274
+ # Add page numbers at bottom center (default)
275
+ python pdf_manipulator.py number -i input.pdf -o output.pdf
276
+
277
+ # Custom position and starting number
278
+ python pdf_manipulator.py number -i input.pdf -o output.pdf --position bottom-right --start 1
279
+
280
+ # Custom format string
281
+ python pdf_manipulator.py number -i input.pdf -o output.pdf --position top-right --format "Page {n}"
282
+ ```
283
+
284
+ Available positions: `bottom-center`, `bottom-left`, `bottom-right`, `top-center`, `top-left`, `top-right`
285
+
286
+ #### Encrypt / Decrypt
287
+
288
+ ```bash
289
+ # Encrypt with a user password
290
+ python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypassword
291
+
292
+ # Encrypt with both user and owner password
293
+ python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypassword --owner-pass ownerpassword
294
+
295
+ # Decrypt / remove password
296
+ python pdf_manipulator.py decrypt -i encrypted.pdf -o decrypted.pdf --password mypassword
297
+ ```
298
+
299
+ #### Metadata
300
+
301
+ ```bash
302
+ # View metadata
303
+ python pdf_manipulator.py metadata -i input.pdf
304
+
305
+ # Set metadata fields
306
+ python pdf_manipulator.py metadata -i input.pdf -o output.pdf \
307
+ --set-title "Annual Report 2024" \
308
+ --set-author "algorembrant" \
309
+ --set-subject "Finance" \
310
+ --set-keywords "annual,report,finance"
311
+ ```
312
+
313
+ #### Bookmarks / Outline
314
+
315
+ ```bash
316
+ # List all bookmarks
317
+ python pdf_manipulator.py bookmarks -i input.pdf
318
+
319
+ # Add bookmarks
320
+ python pdf_manipulator.py bookmarks -i input.pdf -o output.pdf \
321
+ --add "Introduction:1,Chapter 1:3,Chapter 2:8"
322
+ ```
323
+
324
+ #### Extract Text
325
+
326
+ ```bash
327
+ # Print text from all pages
328
+ python pdf_manipulator.py text -i input.pdf
329
+
330
+ # Extract text from pages 1-3 and save to file
331
+ python pdf_manipulator.py text -i input.pdf --pages 1-3 -o extracted.txt
332
+ ```
333
+
334
+ #### PDF Info
335
+
336
+ ```bash
337
+ python pdf_manipulator.py info -i input.pdf
338
+ ```
339
+
340
+ #### N-Up Layout (Requires Poppler)
341
+
342
+ ```bash
343
+ # 2 pages side-by-side on one sheet
344
+ python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x1
345
+
346
+ # 4 pages in a 2x2 grid on one sheet
347
+ python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x2
348
+ ```
349
+
350
+ #### Compress
351
+
352
+ ```bash
353
+ python pdf_manipulator.py compress -i input.pdf -o output.pdf
354
+ ```
355
+
356
+ #### Batch Operations
357
+
358
+ ```bash
359
+ # Remove page 1 (e.g. cover page) from all PDFs in a directory
360
+ python pdf_manipulator.py batch-remove --dir ./pdfs --pages 1 --suffix _no_cover
361
+
362
+ # Merge all PDFs in a directory into one
363
+ python pdf_manipulator.py batch-merge --dir ./pdfs -o merged_all.pdf
364
+
365
+ # Split all PDFs in a directory into individual pages
366
+ python pdf_manipulator.py batch-split --dir ./pdfs --out-dir ./split_output
367
+ ```
368
+
369
+ ---
370
+
371
+ ## Notes
372
+
373
+ - All page numbers are 1-indexed (first page is page 1).
374
+ - The `nup` command requires Poppler to be installed on your system.
375
+ - For encrypted PDFs, use the `--password` flag with any command that reads them (decrypt first, or add password support per command as needed).
376
+ - Output directories are created automatically if they do not exist.
377
+
378
+ ---
379
+
380
+ ## License
381
+
382
+ MIT License. See [LICENSE](LICENSE) for details.
383
+
384
+ ---
385
+
386
+ ## Author
387
+
388
+ **algorembrant**
pdf_manipulator.py ADDED
@@ -0,0 +1,1065 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ================================================================================
3
+ PDF MANIPULATOR - Full-Featured PDF Page Manipulation Toolkit
4
+ ================================================================================
5
+ Author : algorembrant
6
+ Version : 1.0.0
7
+ License : MIT
8
+
9
+ USAGE COMMANDS (run from terminal):
10
+ --------------------------------------------------------------------------------
11
+
12
+ MERGE
13
+ python pdf_manipulator.py merge -i file1.pdf file2.pdf file3.pdf -o merged.pdf
14
+ python pdf_manipulator.py merge -i file1.pdf file2.pdf -o out.pdf --interleave
15
+
16
+ SPLIT
17
+ python pdf_manipulator.py split -i input.pdf -o ./output_dir
18
+ python pdf_manipulator.py split -i input.pdf -o ./output_dir --range 1-5
19
+ python pdf_manipulator.py split -i input.pdf -o ./output_dir --range 2,4,6
20
+
21
+ REMOVE PAGES
22
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 3
23
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3,5
24
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 2-5
25
+ python pdf_manipulator.py remove -i input.pdf -o output.pdf --pages 1,3-5,7
26
+
27
+ EXTRACT PAGES
28
+ python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 1-3
29
+ python pdf_manipulator.py extract -i input.pdf -o output.pdf --pages 2,4,6
30
+
31
+ REORDER PAGES
32
+ python pdf_manipulator.py reorder -i input.pdf -o output.pdf --order 3,1,2,4
33
+
34
+ ROTATE PAGES
35
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 90
36
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 180 --pages 1,3
37
+ python pdf_manipulator.py rotate -i input.pdf -o output.pdf --angle 270 --pages 2-4
38
+
39
+ REVERSE
40
+ python pdf_manipulator.py reverse -i input.pdf -o output.pdf
41
+
42
+ DUPLICATE PAGES
43
+ python pdf_manipulator.py duplicate -i input.pdf -o output.pdf --pages 2 --times 3
44
+
45
+ INSERT BLANK PAGES
46
+ python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --after 2
47
+ python pdf_manipulator.py insert-blank -i input.pdf -o output.pdf --before 1
48
+
49
+ INSERT PDF PAGES
50
+ python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --after 3
51
+ python pdf_manipulator.py insert -i base.pdf --insert-file extra.pdf -o output.pdf --before 2
52
+
53
+ REPLACE PAGES
54
+ python pdf_manipulator.py replace -i base.pdf --replace-file new.pdf -o output.pdf --pages 2 --replace-pages 1
55
+
56
+ CROP PAGES
57
+ python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700"
58
+ python pdf_manipulator.py crop -i input.pdf -o output.pdf --box "50,50,500,700" --pages 1-3
59
+
60
+ SCALE / RESIZE
61
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --factor 0.5
62
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size A4
63
+ python pdf_manipulator.py scale -i input.pdf -o output.pdf --to-size letter
64
+
65
+ WATERMARK
66
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "CONFIDENTIAL"
67
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --text "DRAFT" --opacity 0.3 --angle 45
68
+ python pdf_manipulator.py watermark -i input.pdf -o output.pdf --watermark-pdf wm.pdf
69
+
70
+ STAMP / OVERLAY
71
+ python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf
72
+ python pdf_manipulator.py stamp -i input.pdf -o output.pdf --stamp-pdf stamp.pdf --pages 1
73
+
74
+ ADD PAGE NUMBERS
75
+ python pdf_manipulator.py number -i input.pdf -o output.pdf
76
+ python pdf_manipulator.py number -i input.pdf -o output.pdf --position bottom-center --start 1
77
+ python pdf_manipulator.py number -i input.pdf -o output.pdf --position top-right --format "Page {n}"
78
+
79
+ ENCRYPT / DECRYPT
80
+ python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypass --owner-pass ownerpass
81
+ python pdf_manipulator.py encrypt -i input.pdf -o output.pdf --user-pass mypass
82
+ python pdf_manipulator.py decrypt -i encrypted.pdf -o decrypted.pdf --password mypass
83
+
84
+ METADATA
85
+ python pdf_manipulator.py metadata -i input.pdf
86
+ python pdf_manipulator.py metadata -i input.pdf -o output.pdf --set-title "My Title" --set-author "algorembrant"
87
+ python pdf_manipulator.py metadata -i input.pdf -o output.pdf --set-subject "Report" --set-keywords "pdf,report"
88
+
89
+ BOOKMARKS / OUTLINE
90
+ python pdf_manipulator.py bookmarks -i input.pdf
91
+ python pdf_manipulator.py bookmarks -i input.pdf -o output.pdf --add "Chapter 1:1,Chapter 2:5"
92
+
93
+ EXTRACT TEXT
94
+ python pdf_manipulator.py text -i input.pdf
95
+ python pdf_manipulator.py text -i input.pdf --pages 1-3 -o extracted.txt
96
+
97
+ INFO / INSPECT
98
+ python pdf_manipulator.py info -i input.pdf
99
+
100
+ N-UP (multiple pages per sheet)
101
+ python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x1
102
+ python pdf_manipulator.py nup -i input.pdf -o output.pdf --layout 2x2
103
+
104
+ COMPRESS
105
+ python pdf_manipulator.py compress -i input.pdf -o output.pdf
106
+
107
+ BATCH OPERATIONS
108
+ python pdf_manipulator.py batch-remove --dir ./pdfs --pages 1 --suffix _no_cover
109
+ python pdf_manipulator.py batch-merge --dir ./pdfs -o merged_all.pdf
110
+ python pdf_manipulator.py batch-split --dir ./pdfs --out-dir ./split_output
111
+
112
+ --------------------------------------------------------------------------------
113
+ PAGE RANGE SYNTAX:
114
+ Single page : 3
115
+ Multiple pages: 1,3,5
116
+ Range : 2-5 (inclusive)
117
+ Mixed : 1,3-5,7,9-11
118
+ Pages are always 1-indexed (first page = 1)
119
+ --------------------------------------------------------------------------------
120
+ """
121
+
122
+ from __future__ import annotations
123
+
124
+ import argparse
125
+ import io
126
+ import os
127
+ import re
128
+ import sys
129
+ import glob
130
+ from copy import deepcopy
131
+ from pathlib import Path
132
+ from typing import List, Optional, Tuple
133
+
134
+ from pypdf import PdfReader, PdfWriter
135
+ from pypdf.generic import NameObject, NumberObject
136
+ from reportlab.lib.pagesizes import A4, letter, A3, A5, LETTER
137
+ from reportlab.lib.units import mm, inch
138
+ from reportlab.pdfgen import canvas as rl_canvas
139
+ from reportlab.lib import colors
140
+
141
+
142
+ # ---------------------------------------------------------------------------
143
+ # Constants
144
+ # ---------------------------------------------------------------------------
145
+
146
+ PAGE_SIZES = {
147
+ "a3": A3,
148
+ "a4": A4,
149
+ "a5": A5,
150
+ "letter": letter,
151
+ "LETTER": LETTER,
152
+ }
153
+
154
+ NUMBER_POSITIONS = {
155
+ "bottom-center": lambda w, h: (w / 2, 20),
156
+ "bottom-left": lambda w, h: (30, 20),
157
+ "bottom-right": lambda w, h: (w - 30, 20),
158
+ "top-center": lambda w, h: (w / 2, h - 20),
159
+ "top-left": lambda w, h: (30, h - 20),
160
+ "top-right": lambda w, h: (w - 30, h - 20),
161
+ }
162
+
163
+
164
+ # ---------------------------------------------------------------------------
165
+ # Utility helpers
166
+ # ---------------------------------------------------------------------------
167
+
168
+ def parse_page_range(spec: str, total: int) -> List[int]:
169
+ """
170
+ Parse a page-range string into a sorted list of 0-based indices.
171
+ Input is 1-based, e.g. "1,3-5,7" -> [0, 2, 3, 4, 6]
172
+ """
173
+ indices: set[int] = set()
174
+ for part in spec.split(","):
175
+ part = part.strip()
176
+ if "-" in part:
177
+ a, b = part.split("-", 1)
178
+ a_i, b_i = int(a.strip()), int(b.strip())
179
+ if a_i < 1 or b_i > total or a_i > b_i:
180
+ raise ValueError(
181
+ f"Range {a_i}-{b_i} is out of bounds (document has {total} pages)."
182
+ )
183
+ indices.update(range(a_i - 1, b_i))
184
+ else:
185
+ n = int(part)
186
+ if n < 1 or n > total:
187
+ raise ValueError(
188
+ f"Page {n} is out of bounds (document has {total} pages)."
189
+ )
190
+ indices.add(n - 1)
191
+ return sorted(indices)
192
+
193
+
194
+ def open_pdf(path: str, password: Optional[str] = None) -> PdfReader:
195
+ reader = PdfReader(path)
196
+ if reader.is_encrypted:
197
+ if password is None:
198
+ password = ""
199
+ reader.decrypt(password)
200
+ return reader
201
+
202
+
203
+ def save_pdf(writer: PdfWriter, output_path: str) -> None:
204
+ out = Path(output_path)
205
+ out.parent.mkdir(parents=True, exist_ok=True)
206
+ with open(out, "wb") as f:
207
+ writer.write(f)
208
+ print(f"[OK] Saved -> {out.resolve()}")
209
+
210
+
211
+ def page_count(path: str) -> int:
212
+ return len(open_pdf(path).pages)
213
+
214
+
215
+ def make_watermark_pdf(
216
+ text: str,
217
+ page_width: float,
218
+ page_height: float,
219
+ opacity: float = 0.15,
220
+ angle: float = 45,
221
+ font_size: int = 60,
222
+ ) -> io.BytesIO:
223
+ buf = io.BytesIO()
224
+ c = rl_canvas.Canvas(buf, pagesize=(page_width, page_height))
225
+ c.setFont("Helvetica-Bold", font_size)
226
+ c.setFillColor(colors.red, alpha=opacity)
227
+ c.saveState()
228
+ c.translate(page_width / 2, page_height / 2)
229
+ c.rotate(angle)
230
+ c.drawCentredString(0, 0, text)
231
+ c.restoreState()
232
+ c.save()
233
+ buf.seek(0)
234
+ return buf
235
+
236
+
237
+ def make_page_number_pdf(
238
+ number_str: str,
239
+ page_width: float,
240
+ page_height: float,
241
+ position: str = "bottom-center",
242
+ font_size: int = 10,
243
+ ) -> io.BytesIO:
244
+ buf = io.BytesIO()
245
+ c = rl_canvas.Canvas(buf, pagesize=(page_width, page_height))
246
+ c.setFont("Helvetica", font_size)
247
+ c.setFillColor(colors.black)
248
+ pos_func = NUMBER_POSITIONS.get(position, NUMBER_POSITIONS["bottom-center"])
249
+ x, y = pos_func(page_width, page_height)
250
+ c.drawCentredString(x, y, number_str)
251
+ c.save()
252
+ buf.seek(0)
253
+ return buf
254
+
255
+
256
+ # ---------------------------------------------------------------------------
257
+ # Core operations
258
+ # ---------------------------------------------------------------------------
259
+
260
+ def cmd_merge(args: argparse.Namespace) -> None:
261
+ """Merge multiple PDFs into one."""
262
+ writer = PdfWriter()
263
+ files = args.inputs
264
+
265
+ if args.interleave:
266
+ readers = [open_pdf(f) for f in files]
267
+ max_pages = max(len(r.pages) for r in readers)
268
+ for i in range(max_pages):
269
+ for r in readers:
270
+ if i < len(r.pages):
271
+ writer.add_page(r.pages[i])
272
+ else:
273
+ for f in files:
274
+ reader = open_pdf(f)
275
+ for page in reader.pages:
276
+ writer.add_page(page)
277
+
278
+ save_pdf(writer, args.output)
279
+
280
+
281
+ def cmd_split(args: argparse.Namespace) -> None:
282
+ """Split a PDF into individual pages or ranges."""
283
+ reader = open_pdf(args.input)
284
+ total = len(reader.pages)
285
+ out_dir = Path(args.output)
286
+ out_dir.mkdir(parents=True, exist_ok=True)
287
+ stem = Path(args.input).stem
288
+
289
+ if args.range:
290
+ indices = parse_page_range(args.range, total)
291
+ writer = PdfWriter()
292
+ for idx in indices:
293
+ writer.add_page(reader.pages[idx])
294
+ out_path = out_dir / f"{stem}_pages_{args.range.replace(',', '_')}.pdf"
295
+ save_pdf(writer, str(out_path))
296
+ else:
297
+ for i, page in enumerate(reader.pages):
298
+ writer = PdfWriter()
299
+ writer.add_page(page)
300
+ out_path = out_dir / f"{stem}_page_{i + 1:04d}.pdf"
301
+ save_pdf(writer, str(out_path))
302
+
303
+
304
+ def cmd_remove(args: argparse.Namespace) -> None:
305
+ """Remove specified pages from a PDF."""
306
+ reader = open_pdf(args.input)
307
+ total = len(reader.pages)
308
+ to_remove = set(parse_page_range(args.pages, total))
309
+
310
+ writer = PdfWriter()
311
+ for i, page in enumerate(reader.pages):
312
+ if i not in to_remove:
313
+ writer.add_page(page)
314
+
315
+ if len(writer.pages) == 0:
316
+ print("[WARN] All pages removed - output file will have 0 pages.")
317
+ save_pdf(writer, args.output)
318
+
319
+
320
+ def cmd_extract(args: argparse.Namespace) -> None:
321
+ """Extract specific pages into a new PDF."""
322
+ reader = open_pdf(args.input)
323
+ total = len(reader.pages)
324
+ indices = parse_page_range(args.pages, total)
325
+
326
+ writer = PdfWriter()
327
+ for idx in indices:
328
+ writer.add_page(reader.pages[idx])
329
+ save_pdf(writer, args.output)
330
+
331
+
332
+ def cmd_reorder(args: argparse.Namespace) -> None:
333
+ """Reorder pages according to a specified order."""
334
+ reader = open_pdf(args.input)
335
+ total = len(reader.pages)
336
+ order = [int(x.strip()) - 1 for x in args.order.split(",")]
337
+
338
+ for idx in order:
339
+ if idx < 0 or idx >= total:
340
+ raise ValueError(f"Page {idx + 1} is out of bounds (document has {total} pages).")
341
+
342
+ writer = PdfWriter()
343
+ for idx in order:
344
+ writer.add_page(reader.pages[idx])
345
+ save_pdf(writer, args.output)
346
+
347
+
348
+ def cmd_rotate(args: argparse.Namespace) -> None:
349
+ """Rotate pages by a given angle (90, 180, 270)."""
350
+ if args.angle not in (90, 180, 270):
351
+ raise ValueError("Rotation angle must be 90, 180, or 270.")
352
+
353
+ reader = open_pdf(args.input)
354
+ total = len(reader.pages)
355
+ indices = set(parse_page_range(args.pages, total)) if args.pages else set(range(total))
356
+
357
+ writer = PdfWriter()
358
+ for i, page in enumerate(reader.pages):
359
+ if i in indices:
360
+ page.rotate(args.angle)
361
+ writer.add_page(page)
362
+ save_pdf(writer, args.output)
363
+
364
+
365
+ def cmd_reverse(args: argparse.Namespace) -> None:
366
+ """Reverse the page order of a PDF."""
367
+ reader = open_pdf(args.input)
368
+ writer = PdfWriter()
369
+ for page in reversed(reader.pages):
370
+ writer.add_page(page)
371
+ save_pdf(writer, args.output)
372
+
373
+
374
+ def cmd_duplicate(args: argparse.Namespace) -> None:
375
+ """Duplicate specific pages N times and insert them consecutively."""
376
+ reader = open_pdf(args.input)
377
+ total = len(reader.pages)
378
+ indices = set(parse_page_range(args.pages, total))
379
+ times = args.times
380
+
381
+ writer = PdfWriter()
382
+ for i, page in enumerate(reader.pages):
383
+ if i in indices:
384
+ for _ in range(times):
385
+ writer.add_page(deepcopy(page))
386
+ else:
387
+ writer.add_page(page)
388
+ save_pdf(writer, args.output)
389
+
390
+
391
+ def cmd_insert_blank(args: argparse.Namespace) -> None:
392
+ """Insert one or more blank pages into a PDF."""
393
+ reader = open_pdf(args.input)
394
+ total = len(reader.pages)
395
+ pages_list = list(reader.pages)
396
+
397
+ # Determine reference page dimensions
398
+ ref_page = pages_list[0]
399
+ width = float(ref_page.mediabox.width)
400
+ height = float(ref_page.mediabox.height)
401
+
402
+ # Build blank page
403
+ blank_buf = io.BytesIO()
404
+ c = rl_canvas.Canvas(blank_buf, pagesize=(width, height))
405
+ c.save()
406
+ blank_buf.seek(0)
407
+ blank_reader = PdfReader(blank_buf)
408
+ blank_page = blank_reader.pages[0]
409
+
410
+ if args.after is not None:
411
+ insert_idx = args.after # 0-based: insert after this index
412
+ if insert_idx < 0 or insert_idx > total:
413
+ raise ValueError(f"--after {args.after} is out of range.")
414
+ pages_list.insert(insert_idx, blank_page)
415
+ elif args.before is not None:
416
+ insert_idx = args.before - 1 # convert to 0-based
417
+ if insert_idx < 0 or insert_idx > total:
418
+ raise ValueError(f"--before {args.before} is out of range.")
419
+ pages_list.insert(insert_idx, blank_page)
420
+ else:
421
+ raise ValueError("Specify --after N or --before N.")
422
+
423
+ writer = PdfWriter()
424
+ for p in pages_list:
425
+ writer.add_page(p)
426
+ save_pdf(writer, args.output)
427
+
428
+
429
+ def cmd_insert_pdf(args: argparse.Namespace) -> None:
430
+ """Insert pages from another PDF into the base PDF."""
431
+ base_reader = open_pdf(args.input)
432
+ ins_reader = open_pdf(args.insert_file)
433
+ base_pages = list(base_reader.pages)
434
+ ins_pages = list(ins_reader.pages)
435
+
436
+ if args.after is not None:
437
+ pos = args.after
438
+ elif args.before is not None:
439
+ pos = args.before - 1
440
+ else:
441
+ raise ValueError("Specify --after N or --before N.")
442
+
443
+ result = base_pages[:pos] + ins_pages + base_pages[pos:]
444
+ writer = PdfWriter()
445
+ for p in result:
446
+ writer.add_page(p)
447
+ save_pdf(writer, args.output)
448
+
449
+
450
+ def cmd_replace(args: argparse.Namespace) -> None:
451
+ """Replace specific pages in the base PDF with pages from another PDF."""
452
+ base_reader = open_pdf(args.input)
453
+ rep_reader = open_pdf(args.replace_file)
454
+ total_base = len(base_reader.pages)
455
+ total_rep = len(rep_reader.pages)
456
+
457
+ base_indices = parse_page_range(args.pages, total_base)
458
+ rep_indices = parse_page_range(args.replace_pages, total_rep)
459
+
460
+ if len(base_indices) != len(rep_indices):
461
+ raise ValueError(
462
+ f"Number of pages to replace ({len(base_indices)}) must match "
463
+ f"number of replacement pages ({len(rep_indices)})."
464
+ )
465
+
466
+ replace_map = dict(zip(base_indices, rep_indices))
467
+
468
+ writer = PdfWriter()
469
+ for i, page in enumerate(base_reader.pages):
470
+ if i in replace_map:
471
+ writer.add_page(rep_reader.pages[replace_map[i]])
472
+ else:
473
+ writer.add_page(page)
474
+ save_pdf(writer, args.output)
475
+
476
+
477
+ def cmd_crop(args: argparse.Namespace) -> None:
478
+ """Crop pages to a specific bounding box (left,bottom,right,top)."""
479
+ box_vals = [float(v) for v in args.box.split(",")]
480
+ if len(box_vals) != 4:
481
+ raise ValueError("--box must be 'left,bottom,right,top'.")
482
+ left, bottom, right, top = box_vals
483
+
484
+ reader = open_pdf(args.input)
485
+ total = len(reader.pages)
486
+ indices = set(parse_page_range(args.pages, total)) if args.pages else set(range(total))
487
+
488
+ writer = PdfWriter()
489
+ for i, page in enumerate(reader.pages):
490
+ if i in indices:
491
+ page.mediabox.lower_left = (left, bottom)
492
+ page.mediabox.upper_right = (right, top)
493
+ writer.add_page(page)
494
+ save_pdf(writer, args.output)
495
+
496
+
497
+ def cmd_scale(args: argparse.Namespace) -> None:
498
+ """Scale pages by a factor or resize to a standard page size."""
499
+ reader = open_pdf(args.input)
500
+ writer = PdfWriter()
501
+
502
+ for page in reader.pages:
503
+ orig_w = float(page.mediabox.width)
504
+ orig_h = float(page.mediabox.height)
505
+
506
+ if args.factor:
507
+ f = args.factor
508
+ page.scale(f, f)
509
+ elif args.to_size:
510
+ target = PAGE_SIZES.get(args.to_size.lower())
511
+ if target is None:
512
+ raise ValueError(f"Unknown page size: {args.to_size}. Choose from {list(PAGE_SIZES.keys())}")
513
+ tw, th = target
514
+ fx = tw / orig_w
515
+ fy = th / orig_h
516
+ page.scale(fx, fy)
517
+
518
+ writer.add_page(page)
519
+ save_pdf(writer, args.output)
520
+
521
+
522
+ def cmd_watermark(args: argparse.Namespace) -> None:
523
+ """Add a text or PDF watermark to each page."""
524
+ reader = open_pdf(args.input)
525
+ writer = PdfWriter()
526
+
527
+ for page in reader.pages:
528
+ w = float(page.mediabox.width)
529
+ h = float(page.mediabox.height)
530
+
531
+ if args.watermark_pdf:
532
+ wm_reader = open_pdf(args.watermark_pdf)
533
+ wm_page = wm_reader.pages[0]
534
+ else:
535
+ text = args.text or "WATERMARK"
536
+ opacity = args.opacity if args.opacity else 0.15
537
+ angle = args.angle if args.angle else 45
538
+ wm_buf = make_watermark_pdf(text, w, h, opacity=opacity, angle=angle)
539
+ wm_reader = PdfReader(wm_buf)
540
+ wm_page = wm_reader.pages[0]
541
+
542
+ page.merge_page(wm_page)
543
+ writer.add_page(page)
544
+ save_pdf(writer, args.output)
545
+
546
+
547
+ def cmd_stamp(args: argparse.Namespace) -> None:
548
+ """Overlay a stamp PDF on top of pages."""
549
+ reader = open_pdf(args.input)
550
+ stamp_reader = open_pdf(args.stamp_pdf)
551
+ stamp_page = stamp_reader.pages[0]
552
+ total = len(reader.pages)
553
+ indices = set(parse_page_range(args.pages, total)) if args.pages else set(range(total))
554
+
555
+ writer = PdfWriter()
556
+ for i, page in enumerate(reader.pages):
557
+ if i in indices:
558
+ page.merge_page(stamp_page)
559
+ writer.add_page(page)
560
+ save_pdf(writer, args.output)
561
+
562
+
563
+ def cmd_number(args: argparse.Namespace) -> None:
564
+ """Add page numbers to each page."""
565
+ reader = open_pdf(args.input)
566
+ writer = PdfWriter()
567
+ position = args.position or "bottom-center"
568
+ start = args.start if args.start else 1
569
+ fmt = args.format or "{n}"
570
+
571
+ for i, page in enumerate(reader.pages):
572
+ w = float(page.mediabox.width)
573
+ h = float(page.mediabox.height)
574
+ number_str = fmt.replace("{n}", str(i + start))
575
+ num_buf = make_page_number_pdf(number_str, w, h, position=position)
576
+ num_reader = PdfReader(num_buf)
577
+ page.merge_page(num_reader.pages[0])
578
+ writer.add_page(page)
579
+ save_pdf(writer, args.output)
580
+
581
+
582
+ def cmd_encrypt(args: argparse.Namespace) -> None:
583
+ """Encrypt a PDF with user and owner passwords."""
584
+ reader = open_pdf(args.input)
585
+ writer = PdfWriter()
586
+ for page in reader.pages:
587
+ writer.add_page(page)
588
+ user_pw = args.user_pass or ""
589
+ owner_pw = args.owner_pass or args.user_pass or ""
590
+ writer.encrypt(user_pw, owner_pw)
591
+ save_pdf(writer, args.output)
592
+
593
+
594
+ def cmd_decrypt(args: argparse.Namespace) -> None:
595
+ """Decrypt / remove password from a PDF."""
596
+ reader = open_pdf(args.input, password=args.password)
597
+ if not reader.is_encrypted and not args.password:
598
+ print("[INFO] File is not encrypted.")
599
+ writer = PdfWriter()
600
+ for page in reader.pages:
601
+ writer.add_page(page)
602
+ save_pdf(writer, args.output)
603
+
604
+
605
+ def cmd_metadata(args: argparse.Namespace) -> None:
606
+ """View or set PDF metadata."""
607
+ reader = open_pdf(args.input)
608
+ meta = reader.metadata
609
+ print("\n--- PDF Metadata ---")
610
+ print(f" Title : {meta.title}")
611
+ print(f" Author : {meta.author}")
612
+ print(f" Subject : {meta.subject}")
613
+ print(f" Keywords : {meta.get('/Keywords', '')}")
614
+ print(f" Creator : {meta.creator}")
615
+ print(f" Producer : {meta.producer}")
616
+ print(f" Created : {meta.get('/CreationDate', '')}")
617
+ print(f" Modified : {meta.get('/ModDate', '')}")
618
+ print()
619
+
620
+ if args.output and any([args.set_title, args.set_author, args.set_subject, args.set_keywords]):
621
+ writer = PdfWriter()
622
+ for page in reader.pages:
623
+ writer.add_page(page)
624
+ new_meta = {}
625
+ if args.set_title:
626
+ new_meta["/Title"] = args.set_title
627
+ if args.set_author:
628
+ new_meta["/Author"] = args.set_author
629
+ if args.set_subject:
630
+ new_meta["/Subject"] = args.set_subject
631
+ if args.set_keywords:
632
+ new_meta["/Keywords"] = args.set_keywords
633
+ writer.add_metadata(new_meta)
634
+ save_pdf(writer, args.output)
635
+
636
+
637
+ def cmd_bookmarks(args: argparse.Namespace) -> None:
638
+ """List or add bookmarks/outline entries."""
639
+ reader = open_pdf(args.input)
640
+ outlines = reader.outline
641
+
642
+ def _print_outline(items, indent=0):
643
+ for item in items:
644
+ if isinstance(item, list):
645
+ _print_outline(item, indent + 2)
646
+ else:
647
+ try:
648
+ title = item.title
649
+ page_obj = reader.get_destination_page_number(item)
650
+ print(f"{' ' * indent} {title} (page {page_obj + 1})")
651
+ except Exception:
652
+ pass
653
+
654
+ print("\n--- Bookmarks / Outline ---")
655
+ if outlines:
656
+ _print_outline(outlines)
657
+ else:
658
+ print(" (none)")
659
+ print()
660
+
661
+ if args.output and args.add:
662
+ writer = PdfWriter()
663
+ for page in reader.pages:
664
+ writer.add_page(page)
665
+ for entry in args.add.split(","):
666
+ title, pg = entry.strip().split(":")
667
+ writer.add_outline_item(title.strip(), int(pg.strip()) - 1)
668
+ save_pdf(writer, args.output)
669
+
670
+
671
+ def cmd_text(args: argparse.Namespace) -> None:
672
+ """Extract text from PDF pages."""
673
+ reader = open_pdf(args.input)
674
+ total = len(reader.pages)
675
+ indices = parse_page_range(args.pages, total) if args.pages else list(range(total))
676
+
677
+ lines = []
678
+ for idx in indices:
679
+ text = reader.pages[idx].extract_text() or ""
680
+ lines.append(f"=== Page {idx + 1} ===\n{text}\n")
681
+
682
+ full_text = "\n".join(lines)
683
+
684
+ if args.output:
685
+ with open(args.output, "w", encoding="utf-8") as f:
686
+ f.write(full_text)
687
+ print(f"[OK] Text saved -> {args.output}")
688
+ else:
689
+ print(full_text)
690
+
691
+
692
+ def cmd_info(args: argparse.Namespace) -> None:
693
+ """Display detailed information about a PDF."""
694
+ reader = open_pdf(args.input)
695
+ total = len(reader.pages)
696
+ meta = reader.metadata
697
+ print("\n--- PDF Info ---")
698
+ print(f" File : {args.input}")
699
+ print(f" Pages : {total}")
700
+ print(f" Encrypted : {reader.is_encrypted}")
701
+ print(f" Title : {meta.title}")
702
+ print(f" Author : {meta.author}")
703
+ print()
704
+ print(" Page Dimensions:")
705
+ for i, page in enumerate(reader.pages):
706
+ w = float(page.mediabox.width)
707
+ h = float(page.mediabox.height)
708
+ print(f" Page {i + 1:4d}: {w:.1f} x {h:.1f} pt ({w/72:.2f} x {h/72:.2f} in)")
709
+ print()
710
+
711
+
712
+ def cmd_nup(args: argparse.Namespace) -> None:
713
+ """Arrange N pages per output sheet (e.g. 2x1, 2x2)."""
714
+ layout = args.layout.lower()
715
+ try:
716
+ cols, rows = [int(x) for x in layout.split("x")]
717
+ except ValueError:
718
+ raise ValueError("--layout must be CxR, e.g. 2x1 or 2x2")
719
+
720
+ reader = open_pdf(args.input)
721
+ per_sheet = cols * rows
722
+ total = len(reader.pages)
723
+
724
+ # Use first page dimensions for output sheet
725
+ first_page = reader.pages[0]
726
+ pw = float(first_page.mediabox.width)
727
+ ph = float(first_page.mediabox.height)
728
+ cell_w = pw / cols
729
+ cell_h = ph / rows
730
+ sheet_w = pw
731
+ sheet_h = ph
732
+
733
+ writer = PdfWriter()
734
+
735
+ i = 0
736
+ while i < total:
737
+ buf = io.BytesIO()
738
+ c = rl_canvas.Canvas(buf, pagesize=(sheet_w, sheet_h))
739
+
740
+ for slot in range(per_sheet):
741
+ if i + slot >= total:
742
+ break
743
+ col = slot % cols
744
+ row = slot // cols
745
+ x_off = col * cell_w
746
+ y_off = sheet_h - (row + 1) * cell_h
747
+
748
+ # Render sub-page
749
+ sub_buf = io.BytesIO()
750
+ sub_writer = PdfWriter()
751
+ sub_writer.add_page(reader.pages[i + slot])
752
+ sub_writer.write(sub_buf)
753
+ sub_buf.seek(0)
754
+
755
+ from reportlab.lib.utils import ImageReader
756
+ from pdf2image import convert_from_bytes
757
+ imgs = convert_from_bytes(sub_buf.read(), dpi=72)
758
+ if imgs:
759
+ img = imgs[0]
760
+ c.drawInlineImage(img, x_off, y_off, width=cell_w, height=cell_h)
761
+
762
+ c.save()
763
+ buf.seek(0)
764
+ nup_reader = PdfReader(buf)
765
+ writer.add_page(nup_reader.pages[0])
766
+ i += per_sheet
767
+
768
+ save_pdf(writer, args.output)
769
+
770
+
771
+ def cmd_compress(args: argparse.Namespace) -> None:
772
+ """Apply lossless compression to all page streams."""
773
+ reader = open_pdf(args.input)
774
+ writer = PdfWriter()
775
+ for page in reader.pages:
776
+ writer.add_page(page)
777
+ writer.compress_identical_objects(remove_identicals=True, remove_orphans=True)
778
+ save_pdf(writer, args.output)
779
+
780
+
781
+ def cmd_batch_remove(args: argparse.Namespace) -> None:
782
+ """Remove pages from all PDFs in a directory."""
783
+ pdfs = sorted(glob.glob(os.path.join(args.dir, "*.pdf")))
784
+ suffix = args.suffix or "_modified"
785
+ for pdf_path in pdfs:
786
+ stem = Path(pdf_path).stem
787
+ out_path = os.path.join(args.dir, f"{stem}{suffix}.pdf")
788
+ reader = open_pdf(pdf_path)
789
+ total = len(reader.pages)
790
+ try:
791
+ to_remove = set(parse_page_range(args.pages, total))
792
+ except ValueError as e:
793
+ print(f"[SKIP] {pdf_path}: {e}")
794
+ continue
795
+ writer = PdfWriter()
796
+ for i, page in enumerate(reader.pages):
797
+ if i not in to_remove:
798
+ writer.add_page(page)
799
+ save_pdf(writer, out_path)
800
+
801
+
802
+ def cmd_batch_merge(args: argparse.Namespace) -> None:
803
+ """Merge all PDFs in a directory into one."""
804
+ pdfs = sorted(glob.glob(os.path.join(args.dir, "*.pdf")))
805
+ writer = PdfWriter()
806
+ for pdf_path in pdfs:
807
+ reader = open_pdf(pdf_path)
808
+ for page in reader.pages:
809
+ writer.add_page(page)
810
+ save_pdf(writer, args.output)
811
+
812
+
813
+ def cmd_batch_split(args: argparse.Namespace) -> None:
814
+ """Split all PDFs in a directory into individual pages."""
815
+ pdfs = sorted(glob.glob(os.path.join(args.dir, "*.pdf")))
816
+ out_dir = Path(args.out_dir)
817
+ out_dir.mkdir(parents=True, exist_ok=True)
818
+ for pdf_path in pdfs:
819
+ stem = Path(pdf_path).stem
820
+ reader = open_pdf(pdf_path)
821
+ for i, page in enumerate(reader.pages):
822
+ writer = PdfWriter()
823
+ writer.add_page(page)
824
+ out_path = out_dir / f"{stem}_page_{i + 1:04d}.pdf"
825
+ save_pdf(writer, str(out_path))
826
+
827
+
828
+ # ---------------------------------------------------------------------------
829
+ # Argument parser
830
+ # ---------------------------------------------------------------------------
831
+
832
+ def build_parser() -> argparse.ArgumentParser:
833
+ parser = argparse.ArgumentParser(
834
+ prog="pdf_manipulator",
835
+ description="Full-featured PDF page manipulation toolkit by algorembrant",
836
+ formatter_class=argparse.RawDescriptionHelpFormatter,
837
+ )
838
+ sub = parser.add_subparsers(dest="command", required=True)
839
+
840
+ # merge
841
+ p = sub.add_parser("merge", help="Merge multiple PDFs")
842
+ p.add_argument("-i", "--inputs", nargs="+", required=True, metavar="FILE")
843
+ p.add_argument("-o", "--output", required=True)
844
+ p.add_argument("--interleave", action="store_true", help="Interleave pages from each file")
845
+
846
+ # split
847
+ p = sub.add_parser("split", help="Split PDF into pages or a range")
848
+ p.add_argument("-i", "--input", required=True)
849
+ p.add_argument("-o", "--output", required=True, help="Output directory")
850
+ p.add_argument("--range", help="Page range to extract (e.g. 1-5 or 2,4,6)")
851
+
852
+ # remove
853
+ p = sub.add_parser("remove", help="Remove pages from a PDF")
854
+ p.add_argument("-i", "--input", required=True)
855
+ p.add_argument("-o", "--output", required=True)
856
+ p.add_argument("--pages", required=True, help="Pages to remove, e.g. 1 or 1,3-5")
857
+
858
+ # extract
859
+ p = sub.add_parser("extract", help="Extract pages to a new PDF")
860
+ p.add_argument("-i", "--input", required=True)
861
+ p.add_argument("-o", "--output", required=True)
862
+ p.add_argument("--pages", required=True, help="Pages to extract, e.g. 1-3")
863
+
864
+ # reorder
865
+ p = sub.add_parser("reorder", help="Reorder pages")
866
+ p.add_argument("-i", "--input", required=True)
867
+ p.add_argument("-o", "--output", required=True)
868
+ p.add_argument("--order", required=True, help="New order, e.g. 3,1,2,4")
869
+
870
+ # rotate
871
+ p = sub.add_parser("rotate", help="Rotate pages")
872
+ p.add_argument("-i", "--input", required=True)
873
+ p.add_argument("-o", "--output", required=True)
874
+ p.add_argument("--angle", required=True, type=int, choices=[90, 180, 270])
875
+ p.add_argument("--pages", help="Pages to rotate (all if omitted)")
876
+
877
+ # reverse
878
+ p = sub.add_parser("reverse", help="Reverse page order")
879
+ p.add_argument("-i", "--input", required=True)
880
+ p.add_argument("-o", "--output", required=True)
881
+
882
+ # duplicate
883
+ p = sub.add_parser("duplicate", help="Duplicate specified pages")
884
+ p.add_argument("-i", "--input", required=True)
885
+ p.add_argument("-o", "--output", required=True)
886
+ p.add_argument("--pages", required=True, help="Pages to duplicate")
887
+ p.add_argument("--times", type=int, default=2, help="Number of copies (default 2)")
888
+
889
+ # insert-blank
890
+ p = sub.add_parser("insert-blank", help="Insert blank page(s)")
891
+ p.add_argument("-i", "--input", required=True)
892
+ p.add_argument("-o", "--output", required=True)
893
+ p.add_argument("--after", type=int, help="Insert after page N (1-indexed)")
894
+ p.add_argument("--before", type=int, help="Insert before page N (1-indexed)")
895
+
896
+ # insert (pdf)
897
+ p = sub.add_parser("insert", help="Insert pages from another PDF")
898
+ p.add_argument("-i", "--input", required=True)
899
+ p.add_argument("-o", "--output", required=True)
900
+ p.add_argument("--insert-file", required=True)
901
+ p.add_argument("--after", type=int, help="Insert after page N")
902
+ p.add_argument("--before", type=int, help="Insert before page N")
903
+
904
+ # replace
905
+ p = sub.add_parser("replace", help="Replace pages with pages from another PDF")
906
+ p.add_argument("-i", "--input", required=True)
907
+ p.add_argument("-o", "--output", required=True)
908
+ p.add_argument("--replace-file", required=True)
909
+ p.add_argument("--pages", required=True, help="Pages in base to replace")
910
+ p.add_argument("--replace-pages", required=True, help="Pages in replacement file to use")
911
+
912
+ # crop
913
+ p = sub.add_parser("crop", help="Crop pages to a bounding box")
914
+ p.add_argument("-i", "--input", required=True)
915
+ p.add_argument("-o", "--output", required=True)
916
+ p.add_argument("--box", required=True, help="left,bottom,right,top in points")
917
+ p.add_argument("--pages", help="Pages to crop (all if omitted)")
918
+
919
+ # scale
920
+ p = sub.add_parser("scale", help="Scale or resize pages")
921
+ p.add_argument("-i", "--input", required=True)
922
+ p.add_argument("-o", "--output", required=True)
923
+ p.add_argument("--factor", type=float, help="Scale factor, e.g. 0.5")
924
+ p.add_argument("--to-size", help="Target page size: a4, a3, a5, letter")
925
+
926
+ # watermark
927
+ p = sub.add_parser("watermark", help="Add watermark to pages")
928
+ p.add_argument("-i", "--input", required=True)
929
+ p.add_argument("-o", "--output", required=True)
930
+ p.add_argument("--text", help="Watermark text")
931
+ p.add_argument("--opacity", type=float, default=0.15)
932
+ p.add_argument("--angle", type=float, default=45.0)
933
+ p.add_argument("--watermark-pdf", help="Use a PDF as watermark instead of text")
934
+
935
+ # stamp
936
+ p = sub.add_parser("stamp", help="Overlay a stamp PDF on pages")
937
+ p.add_argument("-i", "--input", required=True)
938
+ p.add_argument("-o", "--output", required=True)
939
+ p.add_argument("--stamp-pdf", required=True)
940
+ p.add_argument("--pages", help="Pages to stamp (all if omitted)")
941
+
942
+ # number
943
+ p = sub.add_parser("number", help="Add page numbers")
944
+ p.add_argument("-i", "--input", required=True)
945
+ p.add_argument("-o", "--output", required=True)
946
+ p.add_argument("--position", default="bottom-center",
947
+ choices=list(NUMBER_POSITIONS.keys()))
948
+ p.add_argument("--start", type=int, default=1)
949
+ p.add_argument("--format", default="{n}", help="Number format, use {n} for page number")
950
+
951
+ # encrypt
952
+ p = sub.add_parser("encrypt", help="Encrypt a PDF")
953
+ p.add_argument("-i", "--input", required=True)
954
+ p.add_argument("-o", "--output", required=True)
955
+ p.add_argument("--user-pass", required=True)
956
+ p.add_argument("--owner-pass", default=None)
957
+
958
+ # decrypt
959
+ p = sub.add_parser("decrypt", help="Remove password from PDF")
960
+ p.add_argument("-i", "--input", required=True)
961
+ p.add_argument("-o", "--output", required=True)
962
+ p.add_argument("--password", required=True)
963
+
964
+ # metadata
965
+ p = sub.add_parser("metadata", help="View or edit PDF metadata")
966
+ p.add_argument("-i", "--input", required=True)
967
+ p.add_argument("-o", "--output", default=None)
968
+ p.add_argument("--set-title")
969
+ p.add_argument("--set-author")
970
+ p.add_argument("--set-subject")
971
+ p.add_argument("--set-keywords")
972
+
973
+ # bookmarks
974
+ p = sub.add_parser("bookmarks", help="List or add bookmarks")
975
+ p.add_argument("-i", "--input", required=True)
976
+ p.add_argument("-o", "--output", default=None)
977
+ p.add_argument("--add", help="Bookmarks to add: 'Title:page,Title2:page2'")
978
+
979
+ # text
980
+ p = sub.add_parser("text", help="Extract text from PDF")
981
+ p.add_argument("-i", "--input", required=True)
982
+ p.add_argument("-o", "--output", default=None, help="Save to file instead of printing")
983
+ p.add_argument("--pages", help="Pages to extract (all if omitted)")
984
+
985
+ # info
986
+ p = sub.add_parser("info", help="Display PDF information")
987
+ p.add_argument("-i", "--input", required=True)
988
+
989
+ # nup
990
+ p = sub.add_parser("nup", help="Arrange N pages per sheet")
991
+ p.add_argument("-i", "--input", required=True)
992
+ p.add_argument("-o", "--output", required=True)
993
+ p.add_argument("--layout", default="2x1", help="Layout e.g. 2x1, 2x2, 4x1")
994
+
995
+ # compress
996
+ p = sub.add_parser("compress", help="Compress PDF streams")
997
+ p.add_argument("-i", "--input", required=True)
998
+ p.add_argument("-o", "--output", required=True)
999
+
1000
+ # batch-remove
1001
+ p = sub.add_parser("batch-remove", help="Remove pages from all PDFs in a directory")
1002
+ p.add_argument("--dir", required=True)
1003
+ p.add_argument("--pages", required=True)
1004
+ p.add_argument("--suffix", default="_modified")
1005
+
1006
+ # batch-merge
1007
+ p = sub.add_parser("batch-merge", help="Merge all PDFs in a directory")
1008
+ p.add_argument("--dir", required=True)
1009
+ p.add_argument("-o", "--output", required=True)
1010
+
1011
+ # batch-split
1012
+ p = sub.add_parser("batch-split", help="Split all PDFs in a directory into pages")
1013
+ p.add_argument("--dir", required=True)
1014
+ p.add_argument("--out-dir", required=True)
1015
+
1016
+ return parser
1017
+
1018
+
1019
+ COMMANDS = {
1020
+ "merge": cmd_merge,
1021
+ "split": cmd_split,
1022
+ "remove": cmd_remove,
1023
+ "extract": cmd_extract,
1024
+ "reorder": cmd_reorder,
1025
+ "rotate": cmd_rotate,
1026
+ "reverse": cmd_reverse,
1027
+ "duplicate": cmd_duplicate,
1028
+ "insert-blank": cmd_insert_blank,
1029
+ "insert": cmd_insert_pdf,
1030
+ "replace": cmd_replace,
1031
+ "crop": cmd_crop,
1032
+ "scale": cmd_scale,
1033
+ "watermark": cmd_watermark,
1034
+ "stamp": cmd_stamp,
1035
+ "number": cmd_number,
1036
+ "encrypt": cmd_encrypt,
1037
+ "decrypt": cmd_decrypt,
1038
+ "metadata": cmd_metadata,
1039
+ "bookmarks": cmd_bookmarks,
1040
+ "text": cmd_text,
1041
+ "info": cmd_info,
1042
+ "nup": cmd_nup,
1043
+ "compress": cmd_compress,
1044
+ "batch-remove": cmd_batch_remove,
1045
+ "batch-merge": cmd_batch_merge,
1046
+ "batch-split": cmd_batch_split,
1047
+ }
1048
+
1049
+
1050
+ def main() -> None:
1051
+ parser = build_parser()
1052
+ args = parser.parse_args()
1053
+ handler = COMMANDS.get(args.command)
1054
+ if handler is None:
1055
+ parser.print_help()
1056
+ sys.exit(1)
1057
+ try:
1058
+ handler(args)
1059
+ except Exception as exc:
1060
+ print(f"[ERROR] {exc}", file=sys.stderr)
1061
+ sys.exit(1)
1062
+
1063
+
1064
+ if __name__ == "__main__":
1065
+ main()
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ pypdf>=4.0.0
2
+ reportlab>=4.0.0
3
+ pdf2image>=1.17.0
4
+ Pillow>=10.0.0