arkodeep
/

spam-classfication-model

spam classification

text classification

Model card Files Files and versions

spam-classfication-model / README.md

arkodeep's picture

Update README.md

94ce757 verified over 1 year ago

|

history blame contribute delete

2.41 kB

	---
	license: wtfpl
	datasets:
	- arkodeep/spam-data
	language:
	- en
	tags:
	- spam
	- spam classification
	- text
	- spam detection
	- text classification
	---

	# Spam Detection System

	## Lite Model

	### Introduction
	The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.

	### Features
	- Text Preprocessing: Lemmatization, removal of stop words and punctuation.
	- Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
	- Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
	- Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
	- Metrics Saving: Accuracy, precision, and F1 score.

	### How to Run
	1. Train the Model:
	```bash
	python training/train_model_lite.py
	```
	2. Use the Model:
	```python
	import joblib
	model = joblib.load('models/model.pkl')
	vectorizer = joblib.load('models/vectorizer.pkl')
	```

	## Legacy Model

	### Introduction
	The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.

	### Features
	- Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
	- Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
	- Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
	- Metrics Saving: Accuracy and precision.

	### How to Run
	1. Train the Model:
	```bash
	python training/train_model_legacy.py
	```
	2. Use the Model:
	```python
	import joblib
	model = joblib.load('models/model.pkl')
	vectorizer = joblib.load('models/vectorizer.pkl')
	```

	### Additional Information
	- Dependencies: Python 3.6 or higher, pip, and required packages listed in `requirements.txt`.
	- Dataset: The dataset used for training is `spam.csv`.
	- Contact and Support: For questions or support, please contact the project maintainers.

	For more details, you can refer to the [README.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/README.md) and [models.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/models.md).