| --- |
| license: cc |
| datasets: |
| - jennifee/HW1-tabular-dataset |
| language: |
| - en |
| metrics: |
| - accuracy |
| base_model: |
| - autogluon/tabpfn-mix-1.0-classifier |
| pipeline_tag: tabular-classification |
| tags: |
| - automl |
| - classification |
| - books |
| - tabular |
| - autogluon |
| --- |
| |
| # Model Card for AutoML Books Classification |
|
|
| This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books. |
| The task is to predict whether a book is **recommended to everyone** based on tabular features. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| - **Developed by:** Bareethul Kader |
| - **Framework:** AutoGluon |
| - **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification) |
| - **License:** CC BY 4.0 |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| ### Direct Use |
| - Educational demonstration of AutoML on a small tabular dataset. |
| - Comparison of multiple classical ML models through automated search. |
| - Understanding validation vs. test performance tradeoffs. |
|
|
| ### Out of Scope Use |
| - Not designed for production or book recommendation engines. |
| - Dataset too small to generalize beyond classroom context. |
|
|
| --- |
|
|
| ## Dataset |
|
|
| - **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset . |
| - **Task:** Classification (`RecommendToEveryone` = 0/1). |
| - **Size:** 30 original samples + ~300 augmented rows. |
| - **Features:** |
| - `Pages` (integer) |
| - `Thickness` (float) |
| - `ReadStatus` (categorical: read/started/not read) |
| - `Genre` (categorical: fiction/nonfiction) |
| - `RecommendToEveryone` (binary target) |
|
|
| --- |
|
|
| ## Training Setup |
|
|
| - **AutoML framework:** AutoGluon TabularPredictor |
| - **Evaluation metric:** Accuracy |
| - **Budget:** 300 seconds training time, small scale search |
| - **Hardware:** Google Colab (T4 GPU not required, CPU sufficient) |
| - **Search Space:** |
| - Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest |
| - Neural nets: Torch, FastAI (small MLPs) |
| - Bagging and ensembling across layers (L1, L2, L3) |
|
|
| --- |
|
|
| ## Results |
|
|
| ### Mini Leaderboard (Top 3 Models) |
|
|
| | Rank | Model | Test Accuracy | Validation Accuracy | |
| |------|---------------------------|---------------|----------------------| |
| | 1 | RandomForestEntr_BAG_L1 | **0.55** | 0.65 | |
| | 2 | LightGBM_r96_BAG_L2 | 0.53 | 0.72 | |
| | 3 | LightGBMLarge_BAG_L2 | 0.53 | 0.74 | |
| |
| - **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1` |
| - **Test Accuracy:** ~0.55 |
| - **Validation Accuracy (best across runs):** up to 0.75 (LightGBM variants) |
| |
| Note: The **“best model”** may vary depending on random splits and seeds. |
| While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly. |
| |
| --- |
| |
| ## Limitations, Biases, and Ethical Notes |
| |
| - **Small dataset size** → models may overfit, performance metrics unstable. |
| - **Augmented data** → synthetic rows may not reflect true variability. |
| - **Task scope** → purely educational, not for real world recommendation. |
| |
| --- |
| |
| ## AI Usage Disclosure |
| |
| - ChatGPT (GPT-5) assisted in: |
| - Helping with coding and AutoGluon AutoML approach on the go |
| - Polishing the Colab notebook for clarity |
| - Refining this model card |
| |
| --- |
| |
| ## Citation |
| |
| **BibTeX:** |
| ```bibtex |
| @model{bareethul_books_classification, |
| author = {Kader, Bareethul}, |
| title = {AutoML Books Classification}, |
| year = {2025}, |
| framework = {AutoGluon}, |
| repository = {https://huggingface.co/bareethul/AutoML-books-classification} |
| } |
| |