| --- |
| license: wtfpl |
| datasets: |
| - arkodeep/spam-data |
| language: |
| - en |
| tags: |
| - spam |
| - spam classification |
| - text |
| - spam detection |
| - text classification |
| --- |
| |
| # Spam Detection System |
|
|
| ## Lite Model |
|
|
| ### Introduction |
| The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection. |
|
|
| ### Features |
| - **Text Preprocessing**: Lemmatization, removal of stop words and punctuation. |
| - **Feature Extraction**: Text length, word count, unique word count, uppercase count, special character count. |
| - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier. |
| - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics. |
| - **Metrics Saving**: Accuracy, precision, and F1 score. |
|
|
| ### How to Run |
| 1. **Train the Model**: |
| ```bash |
| python training/train_model_lite.py |
| ``` |
| 2. **Use the Model**: |
| ```python |
| import joblib |
| model = joblib.load('models/model.pkl') |
| vectorizer = joblib.load('models/vectorizer.pkl') |
| ``` |
| |
| ## Legacy Model |
|
|
| ### Introduction |
| The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection. |
|
|
| ### Features |
| - **Text Preprocessing**: Porter Stemming, removal of stop words and punctuation. |
| - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters. |
| - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics. |
| - **Metrics Saving**: Accuracy and precision. |
|
|
| ### How to Run |
| 1. **Train the Model**: |
| ```bash |
| python training/train_model_legacy.py |
| ``` |
| 2. **Use the Model**: |
| ```python |
| import joblib |
| model = joblib.load('models/model.pkl') |
| vectorizer = joblib.load('models/vectorizer.pkl') |
| ``` |
| |
| ### Additional Information |
| - **Dependencies**: Python 3.6 or higher, pip, and required packages listed in `requirements.txt`. |
| - **Dataset**: The dataset used for training is `spam.csv`. |
| - **Contact and Support**: For questions or support, please contact the project maintainers. |
|
|
| For more details, you can refer to the [README.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/README.md) and [models.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/models.md). |