bareethul
/

AutoML-books-classification

Tabular Classification

Model card Files Files and versions

AutoML-books-classification / README.md

bareethul's picture

Update README.md

839b8f0 verified 7 months ago

|

history blame contribute delete

3.69 kB

	---
	license: cc
	datasets:
	- jennifee/HW1-tabular-dataset
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- autogluon/tabpfn-mix-1.0-classifier
	pipeline_tag: tabular-classification
	tags:
	- automl
	- classification
	- books
	- tabular
	- autogluon
	---

	# Model Card for AutoML Books Classification

	This model card documents the AutoML Books Classification model trained with AutoGluon AutoML on a classmate’s dataset of fiction and nonfiction books.
	The task is to predict whether a book is recommended to everyone based on tabular features.

	---

	## Model Details

	- Developed by: Bareethul Kader
	- Framework: AutoGluon
	- Repository: [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)
	- License: CC BY 4.0

	---

	## Intended Use

	### Direct Use
	- Educational demonstration of AutoML on a small tabular dataset.
	- Comparison of multiple classical ML models through automated search.
	- Understanding validation vs. test performance tradeoffs.

	### Out of Scope Use
	- Not designed for production or book recommendation engines.
	- Dataset too small to generalize beyond classroom context.

	---

	## Dataset

	- Source: https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
	- Task: Classification (`RecommendToEveryone` = 0/1).
	- Size: 30 original samples + ~300 augmented rows.
	- Features:
	- `Pages` (integer)
	- `Thickness` (float)
	- `ReadStatus` (categorical: read/started/not read)
	- `Genre` (categorical: fiction/nonfiction)
	- `RecommendToEveryone` (binary target)

	---

	## Training Setup

	- AutoML framework: AutoGluon TabularPredictor
	- Evaluation metric: Accuracy
	- Budget: 300 seconds training time, small scale search
	- Hardware: Google Colab (T4 GPU not required, CPU sufficient)
	- Search Space:
	- Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
	- Neural nets: Torch, FastAI (small MLPs)
	- Bagging and ensembling across layers (L1, L2, L3)

	---

	## Results

	### Mini Leaderboard (Top 3 Models)

	\| Rank \| Model \| Test Accuracy \| Validation Accuracy \|
	\|------\|---------------------------\|---------------\|----------------------\|
	\| 1 \| RandomForestEntr_BAG_L1 \| 0.55 \| 0.65 \|
	\| 2 \| LightGBM_r96_BAG_L2 \| 0.53 \| 0.72 \|
	\| 3 \| LightGBMLarge_BAG_L2 \| 0.53 \| 0.74 \|

	- Best model (AutoGluon selected): `RandomForestEntr_BAG_L1`
	- Test Accuracy: ~0.55
	- Validation Accuracy (best across runs): up to 0.75 (LightGBM variants)

	Note: The “best model” may vary depending on random splits and seeds.
	While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.

	---

	## Limitations, Biases, and Ethical Notes

	- Small dataset size → models may overfit, performance metrics unstable.
	- Augmented data → synthetic rows may not reflect true variability.
	- Task scope → purely educational, not for real world recommendation.

	---

	## AI Usage Disclosure

	- ChatGPT (GPT-5) assisted in:
	- Helping with coding and AutoGluon AutoML approach on the go
	- Polishing the Colab notebook for clarity
	- Refining this model card

	---

	## Citation

	BibTeX:
	```bibtex
	@model{bareethul_books_classification,
	author = {Kader, Bareethul},
	title = {AutoML Books Classification},
	year = {2025},
	framework = {AutoGluon},
	repository = {https://huggingface.co/bareethul/AutoML-books-classification}
	}