Spaces:

AhmedBou
/

Smart_PDF_Chapter_Splitter

Running

App Files Files Community

AhmedBou commited on Feb 2

Commit

8564c6a

verified ·

1 Parent(s): 39afa3f

Update README.md

Browse files

Files changed (1) hide show

README.md +76 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Smart PDF Chapter Splitter
-emoji: 🌍
 colorFrom: gray
 colorTo: yellow
 sdk: gradio
@@ -11,4 +11,79 @@ license: mit
 short_description: 'Split large PDFs (books) into clean, per-chapter files '
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Smart PDF Chapter Splitter
+emoji: 📚
 colorFrom: gray
 colorTo: yellow
 sdk: gradio
 short_description: 'Split large PDFs (books) into clean, per-chapter files '
 ---
+# 📚 Smart PDF Chapter Splitter
+Split large PDFs (books, manuals, technical documents) into clean, per-chapter files — **fast, local, and deterministic**.
+This tool uses **PDF bookmarks (Table of Contents)** to extract chapters with **near-perfect accuracy** for professionally published documents.
+---
+## ✨ Features
+- 📖 Splits PDFs into individual chapter files
+- ⚙️ Uses **embedded bookmarks** (no AI, no guesswork)
+- 🚀 Extremely fast (local processing)
+- 🧼 Safe filenames (cross-platform)
+- 📂 Batch-ready and automation-friendly
+---
+## 🧠 How It Works
+Most modern PDFs contain an internal **Table of Contents (bookmarks)**.
+This Space:
+1. Reads the PDF outline
+2. Identifies top-level chapters
+3. Calculates page ranges
+4. Exports each chapter as its own PDF
+> ✅ Deterministic
+> ❌ No OCR
+> ❌ No AI hallucinations
+---
+## 📊 Accuracy Expectations
+| PDF Type | Accuracy |
+|-------|---------|
+| Digital-first published books | ⭐⭐⭐⭐⭐ (~100%) |
+| Technical manuals | ⭐⭐⭐⭐⭐ |
+| Semi-digital PDFs | ⭐⭐⭐⭐ |
+| Scanned PDFs (no bookmarks) | ❌ Not supported |
+---
+## 🏗️ Ideal Use Cases
+- 📚 Published books (Springer, O’Reilly, Wiley, Packt…)
+- ⚙️ Engineering manuals
+- 🧾 Technical specifications
+- 🏭 PLM & documentation pipelines
+- 📂 Large PDF libraries
+---
+## 🚫 Limitations
+This tool **requires bookmarks**.
+If your PDF:
+- Is scanned
+- Has no outline
+- Has broken TOC metadata
+➡️ You will need **OCR or AI-based structure detection** (not included here).
+---
+## 🛠️ Tech Stack
+- **Python**
+- **PyMuPDF (fitz)**
+- Local execution (no cloud dependency)
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference