| --- |
| library_name: transformers |
| tags: [] |
| --- |
| # Model Card for `zay25/MNLP_M3_quantized_model` |
| |
| This model is a quantized version of a multiple-choice question answering (MCQA) model fine-tuned on STEM datasets. It uses Activation-aware Weight Quantization (AWQ) to reduce model size and VRAM usage while preserving strong performance. The model is well-suited for memory- and latency-constrained environments. |
| |
| --- |
| |
| ## Model Details |
| |
| - **Developed by**: Zeineb Mellouli (EPFL, CS-552 Project) |
| - **Base model**: `hssawhney/Best-Performing-Model` (Qwen3-0.6B-Base) |
| - **Quantization**: AWQ (4-bit weights, 16-bit activations) |
| - **Architecture**: Transformer-based Causal Language Model |
| - **Language**: English |
| - **License**: Apache 2.0 |
| |
| --- |
| |
| ## Uses |
| |
| ### Direct Use |
| |
| This model is intended for multiple-choice question answering (MCQA) tasks, particularly in science, math, and engineering education datasets. It is optimized for inference on GPUs with limited VRAM (e.g., A10, T4, or laptop GPUs). |
| |
| ### Out-of-Scope Use |
| |
| - Not intended for open-ended or dialog generation |
| - Not suitable for high-stakes decision-making or critical applications without human oversight |
| |
| |
| ## Training Details |
| |
| - **Quantization method**: Post-training quantization using [AWQ (Activation-aware Weight Quantization)](https://github.com/mit-han-lab/awq) via the `awq` library |
| - **Base model**: `hssawhney/Best-Performing-Model`, fine-tuned on MCQA-style reasoning tasks |
| - **Quantization configuration**: |
| - 4-bit weights (`w_bit = 4`) |
| - Group size: 64 |
| - Per-channel zero point: enabled |
| - **Calibration dataset**: 512 samples from `hssawhney/Reasoning-Dataset` |
|
|
| --- |
|
|
| ## How to Use |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("zay25/MNLP_M3_quantized_model", trust_remote_code=True) |
| tokenizer = AutoTokenizer.from_pretrained("zay25/MNLP_M3_quantized_model") |
| |