Prateek-044 commited on
Commit
fbd174e
Β·
verified Β·
1 Parent(s): e9cd410

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -481
README.md CHANGED
@@ -1,481 +1,38 @@
1
- <<<<<<< HEAD
2
- # πŸ“ NoteSnap
3
-
4
- <div align="center">
5
-
6
- ![NoteSnap Logo](https://img.shields.io/badge/πŸ“-NoteSnap-blue?style=for-the-badge)
7
-
8
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
- [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
10
- [![Streamlit](https://img.shields.io/badge/Streamlit-FF4B4B?style=flat&logo=streamlit&logoColor=white)](https://streamlit.io/)
11
- [![Docker](https://img.shields.io/badge/Docker-2496ED?style=flat&logo=docker&logoColor=white)](https://www.docker.com/)
12
- [![Transformers](https://img.shields.io/badge/πŸ€—%20Transformers-FFD21E?style=flat)](https://huggingface.co/transformers/)
13
-
14
- [![GitHub stars](https://img.shields.io/github/stars/PRATEEK-260/NoteSnap?style=social)](https://github.com/PRATEEK-260/NoteSnap/stargazers)
15
- [![GitHub forks](https://img.shields.io/github/forks/PRATEEK-260/NoteSnap?style=social)](https://github.com/PRATEEK-260/NoteSnap/network/members)
16
- [![GitHub issues](https://img.shields.io/github/issues/PRATEEK-260/NoteSnap)](https://github.com/PRATEEK-260/NoteSnap/issues)
17
-
18
- </div>
19
-
20
- A powerful web application that transforms lengthy documents and notes into concise, bullet-point summaries using state-of-the-art AI models.
21
-
22
- ---
23
-
24
- ## πŸ“‹ Table of Contents
25
-
26
- - [✨ Features](#-features)
27
- - [πŸš€ Quick Start](#-quick-start)
28
- - [Option 1: Docker (Recommended)](#option-1-docker-recommended)
29
- - [Option 2: Local Installation](#option-2-local-installation)
30
- - [πŸ“– Usage Guide](#-usage-guide)
31
- - [πŸ–ΌοΈ Screenshots](#️-screenshots)
32
- - [πŸ› οΈ Technical Details](#️-technical-details)
33
- - [🐳 Docker Deployment](#-docker-deployment)
34
- - [πŸ”§ Configuration](#-configuration)
35
- - [🚨 Troubleshooting](#-troubleshooting)
36
- - [🀝 Contributing](#-contributing)
37
- - [πŸ“„ License](#-license)
38
- - [πŸ™ Acknowledgments](#-acknowledgments)
39
- - [πŸ“ž Support](#-support)
40
-
41
- ---
42
-
43
- ## ✨ Features
44
-
45
- - **PDF Processing**: Upload PDF files and extract text content automatically
46
- - **Direct Text Input**: Paste text content directly for immediate summarization
47
- - **AI-Powered Summarization**: Uses Hugging Face Transformers (BART, T5) for high-quality summaries
48
- - **Bullet-Point Format**: Clean, readable bullet-point summaries
49
- - **Multiple AI Models**: Choose from different pre-trained models
50
- - **Customizable Length**: Adjust summary length (Short, Medium, Long)
51
- - **Progress Tracking**: Real-time progress indicators during processing
52
- - **Download Summaries**: Save generated summaries as text files
53
- - **Statistics**: View compression ratios and word counts
54
- - **Error Handling**: Comprehensive error handling and user feedback
55
-
56
- ## πŸš€ Quick Start
57
-
58
- ### 🌐 Try Online (Fastest)
59
- **[πŸš€ Live Demo on Hugging Face Spaces](https://huggingface.co/spaces/PRATEEK-260/NoteSnap)**
60
- - No installation required
61
- - Instant access in your browser
62
- - Full functionality available
63
-
64
- ### Option 1: Docker (Recommended)
65
-
66
- #### Prerequisites
67
- - Docker and Docker Compose installed
68
- - Internet connection (for downloading AI models)
69
-
70
- #### Using Docker Compose (Easiest)
71
- ```bash
72
- # Clone the repository
73
- git clone https://github.com/PRATEEK-260/NoteSnap.git
74
- cd NoteSnap
75
-
76
- # Start the application
77
- docker-compose up -d
78
-
79
- # Access the application at http://localhost:8501
80
- ```
81
-
82
- #### Using Docker Scripts
83
- ```bash
84
- # Build the Docker image
85
- ./docker-build.sh
86
-
87
- # Run the container
88
- ./docker-run.sh
89
-
90
- # For development with live code reloading
91
- ./docker-dev.sh
92
- ```
93
-
94
- #### Manual Docker Commands
95
- ```bash
96
- # Build the image
97
- docker build -t notesnap .
98
-
99
- # Run the container
100
- docker run -p 8501:8501 notesnap
101
- ```
102
-
103
- ### Option 2: Local Installation
104
-
105
- #### Prerequisites
106
- - Python 3.8 or higher
107
- - pip (Python package installer)
108
- - Internet connection (for downloading AI models)
109
-
110
- #### Installation Steps
111
- 1. **Clone the repository**
112
- ```bash
113
- git clone https://github.com/PRATEEK-260/NoteSnap.git
114
- cd NoteSnap
115
- ```
116
-
117
- 2. **Install dependencies**
118
- ```bash
119
- pip install -r requirements.txt
120
- ```
121
-
122
- 3. **Run the application**
123
- ```bash
124
- streamlit run app.py
125
- ```
126
-
127
- 4. **Open your browser**
128
- - The application will automatically open at `http://localhost:8501`
129
- - If it doesn't open automatically, navigate to the URL manually
130
-
131
- ## πŸ“– Usage Guide
132
-
133
- ### PDF Summarization
134
-
135
- 1. **Upload PDF**: Click on the "πŸ“„ PDF Upload" tab
136
- 2. **Select File**: Choose a PDF file (max 10MB)
137
- 3. **Process**: Click "πŸ“– Extract & Summarize PDF"
138
- 4. **Review**: View the extracted text preview
139
- 5. **Get Summary**: The AI will generate a bullet-point summary
140
- 6. **Download**: Save the summary using the download button
141
-
142
- ### Text Summarization
143
-
144
- 1. **Input Text**: Click on the "πŸ“ Text Input" tab
145
- 2. **Paste Content**: Enter or paste your text (minimum 100 characters)
146
- 3. **Summarize**: Click "πŸš€ Summarize Text"
147
- 4. **Review**: View the generated summary
148
- 5. **Download**: Save the summary as needed
149
-
150
- ### Settings
151
-
152
- - **AI Model**: Choose from BART (recommended), T5, or DistilBART
153
- - **Summary Length**: Select Short, Medium, or Long summaries
154
- - **Statistics**: View word counts and compression ratios
155
-
156
- ## πŸ› οΈ Technical Details
157
-
158
- ### Architecture
159
-
160
- ```
161
- NoteSnap/
162
- β”œβ”€β”€ app.py # Main Streamlit application
163
- β”œβ”€β”€ modules/
164
- β”‚ β”œβ”€β”€ __init__.py
165
- β”‚ β”œβ”€β”€ pdf_processor.py # PDF text extraction
166
- β”‚ β”œβ”€β”€ text_summarizer.py # AI summarization
167
- β”‚ └── utils.py # Utility functions
168
- β”œβ”€β”€ requirements.txt # Python dependencies
169
- └── README.md # This file
170
- ```
171
-
172
- ### AI Models
173
-
174
- - **BART (facebook/bart-large-cnn)**: Best quality, recommended for most use cases
175
- - **T5 Small**: Faster processing, good for shorter texts
176
- - **DistilBART**: Balanced performance and speed
177
-
178
- ### Dependencies
179
-
180
- - **Streamlit**: Web application framework
181
- - **Transformers**: Hugging Face AI models
182
- - **PyTorch**: Deep learning framework
183
- - **PyPDF2**: PDF text extraction
184
- - **Additional utilities**: See `requirements.txt`
185
-
186
- ## πŸ”§ Configuration
187
-
188
- ### Model Selection
189
-
190
- You can change the default model by modifying the `TextSummarizer` initialization in `app.py`:
191
-
192
- ```python
193
- text_summarizer = TextSummarizer(model_name="your-preferred-model")
194
- ```
195
-
196
- ### Summary Length
197
-
198
- Adjust default summary lengths in `modules/text_summarizer.py`:
199
-
200
- ```python
201
- self.min_summary_length = 50 # Minimum words
202
- self.max_summary_length = 300 # Maximum words
203
- ```
204
-
205
- ### File Size Limits
206
-
207
- Modify PDF file size limits in `modules/pdf_processor.py`:
208
-
209
- ```python
210
- self.max_file_size = 10 * 1024 * 1024 # 10MB
211
- ```
212
-
213
- ## 🚨 Troubleshooting
214
-
215
- ### Common Issues
216
-
217
- 1. **Model Loading Errors**
218
- - Ensure stable internet connection
219
- - Check available disk space (models can be 1-2GB)
220
- - Try switching to a smaller model (T5 Small or DistilBART)
221
-
222
- 2. **PDF Processing Issues**
223
- - Ensure PDF is not encrypted
224
- - Check if PDF contains readable text (not just images)
225
- - Try with a smaller PDF file
226
-
227
- 3. **Memory Errors**
228
- - Reduce text length
229
- - Close other applications
230
- - Try using CPU instead of GPU
231
-
232
- 4. **Slow Performance**
233
- - Use GPU if available
234
- - Choose smaller models for faster processing
235
- - Process shorter text chunks
236
-
237
- ### Error Messages
238
-
239
- - **"Text is too short"**: Minimum 100 characters required
240
- - **"No readable text found"**: PDF may contain only images
241
- - **"Model loading error"**: Check internet connection
242
- - **"Out of memory"**: Reduce text length or restart application
243
-
244
- ## 🎯 Best Practices
245
-
246
- ### For Best Results
247
-
248
- 1. **Text Quality**: Use well-formatted, coherent text
249
- 2. **Length**: Optimal text length is 500-5000 words
250
- 3. **Content**: Works best with structured content (articles, reports, notes)
251
- 4. **Model Choice**: Use BART for academic/formal content, T5 for general text
252
-
253
- ### Performance Tips
254
-
255
- 1. **GPU Usage**: Enable CUDA for faster processing
256
- 2. **Batch Processing**: Process multiple documents separately
257
- 3. **Model Caching**: Models are cached after first load
258
- 4. **Text Preprocessing**: Clean text improves summary quality
259
-
260
- ## πŸ–ΌοΈ Screenshots
261
-
262
- <div align="center">
263
-
264
- ### Main Interface
265
- ![Main Interface](Screenshots/Main%20interface.png)
266
- *Clean and intuitive interface with PDF upload and text input options*
267
-
268
- ### PDF Processing
269
- ![PDF Processing](Screenshots/pdf%20processing.png)
270
- *Real-time PDF processing with progress indicators*
271
-
272
- ### Summary Results
273
- ![Summary Results](Screenshots/Summery%20Result.png)
274
- *Bullet-point summaries with statistics and download options*
275
-
276
- ### Settings Panel
277
- ![Settings Panel](Screenshots/settings%20panel.png)
278
- *Customizable AI model selection and summary length options*
279
-
280
- </div>
281
-
282
- ## πŸŽ₯ Demo
283
-
284
- πŸš€ **[Live Demo](https://huggingface.co/spaces/PRATEEK-260/NoteSnap)** - Try it now on Hugging Face Spaces!
285
-
286
- ## πŸ“„ License
287
-
288
- This project is open source and available under the MIT License.
289
-
290
- ## 🀝 Contributing
291
-
292
- Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
293
-
294
- ## 🐳 Docker Deployment
295
-
296
- ### Production Deployment
297
-
298
- For production deployment, use the standard Docker Compose configuration:
299
-
300
- ```bash
301
- # Start in production mode
302
- docker-compose up -d
303
-
304
- # View logs
305
- docker-compose logs -f
306
-
307
- # Stop the application
308
- docker-compose down
309
-
310
- # Update the application
311
- docker-compose pull
312
- docker-compose up -d
313
- ```
314
-
315
- ### Development Mode
316
-
317
- For development with live code reloading:
318
-
319
- ```bash
320
- # Start development environment
321
- docker-compose -f docker-compose.dev.yml up
322
-
323
- # Or use the convenience script
324
- ./docker-dev.sh
325
- ```
326
-
327
- ### Docker Configuration
328
-
329
- #### Environment Variables
330
- - `STREAMLIT_SERVER_PORT`: Port for the application (default: 8501)
331
- - `TRANSFORMERS_CACHE`: Cache directory for AI models
332
- - `MAX_FILE_SIZE_MB`: Maximum PDF file size (default: 10MB)
333
-
334
- #### Volumes
335
- - `model_cache`: Persistent storage for downloaded AI models
336
- - `logs`: Application logs
337
- - `uploads`: Temporary file storage (optional)
338
-
339
- #### Resource Limits
340
- - Memory: 4GB limit, 2GB reserved
341
- - CPU: 2 cores limit, 1 core reserved
342
-
343
- ### Docker Troubleshooting
344
-
345
- 1. **Container won't start**: Check logs with `docker-compose logs`
346
- 2. **Out of memory**: Increase Docker memory limits
347
- 3. **Model download fails**: Ensure internet connectivity
348
- 4. **Permission issues**: Check file ownership and Docker user settings
349
-
350
- ## 🀝 Contributing
351
-
352
- We welcome contributions from the community! Here's how you can help:
353
-
354
- ### 🌟 Ways to Contribute
355
-
356
- - ⭐ **Star this repository** if you find it useful
357
- - πŸ› **Report bugs** by opening an [issue](https://github.com/PRATEEK-260/NoteSnap/issues)
358
- - πŸ’‘ **Suggest features** or improvements
359
- - πŸ“– **Improve documentation**
360
- - πŸ”§ **Submit pull requests** with bug fixes or new features
361
-
362
- ### πŸš€ Getting Started
363
-
364
- 1. **Fork the repository**
365
- ```bash
366
- # Click the "Fork" button on GitHub, then:
367
- git clone https://github.com/YOUR-USERNAME/NoteSnap.git
368
- cd NoteSnap
369
- ```
370
-
371
- 2. **Create a feature branch**
372
- ```bash
373
- git checkout -b feature/amazing-feature
374
- ```
375
-
376
- 3. **Make your changes**
377
- - Follow the existing code style
378
- - Add tests for new features
379
- - Update documentation as needed
380
-
381
- 4. **Test your changes**
382
- ```bash
383
- # Run basic tests
384
- python test_basic.py
385
-
386
- # Test Docker build
387
- ./docker-test.sh
388
- ```
389
-
390
- 5. **Submit a pull request**
391
- ```bash
392
- git add .
393
- git commit -m "Add amazing feature"
394
- git push origin feature/amazing-feature
395
- ```
396
-
397
- ### πŸ“‹ Development Guidelines
398
-
399
- - **Code Style**: Follow PEP 8 for Python code
400
- - **Documentation**: Update README.md for new features
401
- - **Testing**: Add tests for new functionality
402
- - **Docker**: Ensure Docker compatibility
403
- - **Dependencies**: Keep requirements.txt updated
404
-
405
- ### πŸ› Reporting Issues
406
-
407
- When reporting issues, please include:
408
-
409
- - **Environment details** (OS, Python version, Docker version)
410
- - **Steps to reproduce** the issue
411
- - **Expected vs actual behavior**
412
- - **Error messages** or logs
413
- - **Screenshots** if applicable
414
-
415
- [**Report an Issue β†’**](https://github.com/PRATEEK-260/NoteSnap/issues/new)
416
-
417
- ### πŸ’¬ Discussions
418
-
419
- Join our community discussions:
420
-
421
- - [**GitHub Discussions**](https://github.com/PRATEEK-260/NoteSnap/discussions) - General questions and ideas
422
- - [**Issues**](https://github.com/PRATEEK-260/NoteSnap/issues) - Bug reports and feature requests
423
-
424
- ## πŸ“„ License
425
-
426
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
427
-
428
- ## πŸ™ Acknowledgments
429
-
430
- ### πŸ› οΈ Built With
431
-
432
- - [**Streamlit**](https://streamlit.io/) - Web application framework
433
- - [**Hugging Face Transformers**](https://huggingface.co/transformers/) - AI/ML models
434
- - [**PyTorch**](https://pytorch.org/) - Deep learning framework
435
- - [**PyPDF2**](https://pypdf2.readthedocs.io/) - PDF processing
436
- - [**Docker**](https://www.docker.com/) - Containerization
437
-
438
- ### 🎯 Inspiration
439
-
440
- - Inspired by the need for efficient document summarization
441
- - Built to help students, researchers, and professionals save time
442
- - Leverages state-of-the-art AI models for high-quality summaries
443
-
444
- ### πŸ€– AI Models
445
-
446
- Special thanks to the teams behind these amazing models:
447
- - [**BART**](https://huggingface.co/facebook/bart-large-cnn) by Facebook AI
448
- - [**T5**](https://huggingface.co/t5-small) by Google Research
449
- - [**DistilBART**](https://huggingface.co/sshleifer/distilbart-cnn-12-6) by Sam Shleifer
450
-
451
- ## πŸ“ž Support
452
-
453
- If you encounter any issues or have questions:
454
-
455
- ### πŸ” Self-Help Resources
456
-
457
- 1. πŸ“– Check the [troubleshooting section](#-troubleshooting) above
458
- 2. πŸ› Review error messages for specific guidance
459
- 3. πŸ“¦ Ensure all dependencies are properly installed
460
- 4. πŸ”„ Try with different models or settings
461
- 5. 🐳 For Docker issues, check container logs: `docker-compose logs`
462
-
463
- ### πŸ’¬ Get Help
464
-
465
- - πŸ› **Bug Reports**: [Open an Issue](https://github.com/PRATEEK-260/NoteSnap/issues/new)
466
- - πŸ’‘ **Feature Requests**: [Start a Discussion](https://github.com/PRATEEK-260/NoteSnap/discussions)
467
-
468
- ---
469
-
470
- <div align="center">
471
-
472
- **Made with ❀️ by [PRATEEK-260](https://github.com/PRATEEK-260)**
473
-
474
- **Happy Summarizing! πŸ“βœ¨**
475
-
476
- [![GitHub](https://img.shields.io/badge/GitHub-PRATEEK--260-181717?style=flat&logo=github)](https://github.com/PRATEEK-260)
477
-
478
- </div>
479
- =======
480
- # NoteSnap
481
- >>>>>>> 9b4f2dab9437daaefabf059cd647a5761c93c197
 
1
+ ---
2
+ title: NoteSnap
3
+ emoji: πŸ“
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: streamlit
7
+ sdk_version: "1.28.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # πŸ“ NoteSnap
13
+
14
+ A powerful web application that transforms lengthy documents and notes into concise, bullet-point summaries using state-of-the-art AI models.
15
+
16
+ ## Features
17
+
18
+ - **PDF Processing**: Upload PDF files and extract text content automatically
19
+ - **Direct Text Input**: Paste text content directly for immediate summarization
20
+ - **AI-Powered Summarization**: Uses Hugging Face Transformers (BART, T5) for high-quality summaries
21
+ - **Bullet-Point Format**: Clean, readable bullet-point summaries
22
+ - **Multiple AI Models**: Choose from different pre-trained models
23
+ - **Customizable Length**: Adjust summary length (Short, Medium, Long)
24
+ - **Download Summaries**: Save generated summaries as text files
25
+
26
+ ## Usage
27
+
28
+ 1. Upload a PDF or paste text
29
+ 2. Choose your AI model and summary length
30
+ 3. Click Summarize
31
+ 4. Download your summary
32
+
33
+ ## Tech Stack
34
+
35
+ - Streamlit
36
+ - Hugging Face Transformers
37
+ - PyTorch
38
+ - PyPDF2