AhmedBou commited on
Commit
8564c6a
Β·
verified Β·
1 Parent(s): 39afa3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Smart PDF Chapter Splitter
3
- emoji: 🌍
4
  colorFrom: gray
5
  colorTo: yellow
6
  sdk: gradio
@@ -11,4 +11,79 @@ license: mit
11
  short_description: 'Split large PDFs (books) into clean, per-chapter files '
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
  title: Smart PDF Chapter Splitter
3
+ emoji: πŸ“š
4
  colorFrom: gray
5
  colorTo: yellow
6
  sdk: gradio
 
11
  short_description: 'Split large PDFs (books) into clean, per-chapter files '
12
  ---
13
 
14
+ # πŸ“š Smart PDF Chapter Splitter
15
+
16
+ Split large PDFs (books, manuals, technical documents) into clean, per-chapter files β€” **fast, local, and deterministic**.
17
+
18
+ This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.
19
+
20
+ ---
21
+
22
+ ## ✨ Features
23
+
24
+ - πŸ“– Splits PDFs into individual chapter files
25
+ - βš™οΈ Uses **embedded bookmarks** (no AI, no guesswork)
26
+ - πŸš€ Extremely fast (local processing)
27
+ - 🧼 Safe filenames (cross-platform)
28
+ - πŸ“‚ Batch-ready and automation-friendly
29
+
30
+ ---
31
+
32
+ ## 🧠 How It Works
33
+
34
+ Most modern PDFs contain an internal **Table of Contents (bookmarks)**.
35
+
36
+ This Space:
37
+ 1. Reads the PDF outline
38
+ 2. Identifies top-level chapters
39
+ 3. Calculates page ranges
40
+ 4. Exports each chapter as its own PDF
41
+
42
+ > βœ… Deterministic
43
+ > ❌ No OCR
44
+ > ❌ No AI hallucinations
45
+
46
+ ---
47
+
48
+ ## πŸ“Š Accuracy Expectations
49
+
50
+ | PDF Type | Accuracy |
51
+ |-------|---------|
52
+ | Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) |
53
+ | Technical manuals | ⭐⭐⭐⭐⭐ |
54
+ | Semi-digital PDFs | ⭐⭐⭐⭐ |
55
+ | Scanned PDFs (no bookmarks) | ❌ Not supported |
56
+
57
+ ---
58
+
59
+ ## πŸ—οΈ Ideal Use Cases
60
+
61
+ - πŸ“š Published books (Springer, O’Reilly, Wiley, Packt…)
62
+ - βš™οΈ Engineering manuals
63
+ - 🧾 Technical specifications
64
+ - 🏭 PLM & documentation pipelines
65
+ - πŸ“‚ Large PDF libraries
66
+
67
+ ---
68
+
69
+ ## 🚫 Limitations
70
+
71
+ This tool **requires bookmarks**.
72
+
73
+ If your PDF:
74
+ - Is scanned
75
+ - Has no outline
76
+ - Has broken TOC metadata
77
+
78
+ ➑️ You will need **OCR or AI-based structure detection** (not included here).
79
+
80
+ ---
81
+
82
+ ## πŸ› οΈ Tech Stack
83
+
84
+ - **Python**
85
+ - **PyMuPDF (fitz)**
86
+ - Local execution (no cloud dependency)
87
+
88
+
89
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference