| | --- |
| | license: cc-by-4.0 |
| | tags: |
| | - multi-label-classification |
| | - text-classification |
| | - onnx |
| | - web-classification |
| | - firefox-ai |
| | - preview |
| | language: |
| | - multilingual |
| | datasets: |
| | - tshasan/multi-label-web-classification |
| | base_model: Alibaba-NLP/gte-modernbert-base |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | --- |
| | |
| | # URL-TITLE-classifier-preview |
| |
|
| | ## Model Overview |
| |
|
| | This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles. |
| |
|
| | The model supports **11 labels**: |
| | `Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`. |
| |
|
| | - **Developed by**: Taimur Hasan |
| | - **Model Type**: Multi-label Text Classification |
| | - **Status**: Preview (under active development) |
| |
|
| | ### Architecture |
| |
|
| | - **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler |
| | - **Problem Type**: Multi-label classification |
| | - **Output Labels**: |
| | - `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized` |
| | - **Input Format**: Concatenated string: |
| | `"{url}:{title}"` |
| |
|
| | --- |
| |
|
| | ## Evaluation Metrics (Validation Data) |
| |
|
| | | Metric | Value | |
| | |-----------------------|--------| |
| | | **Loss** | 0.207 | |
| | | **Hamming Loss** | 0.083 | |
| | | **Exact Match** | 0.445 | |
| | | **Precision (Micro)** | 0.917 | |
| | | **Recall (Micro)** | 0.917 | |
| | | **F1 Score (Micro)** | 0.917 | |
| | | **Precision (Macro)** | 0.795 | |
| | | **Recall (Macro)** | 0.598 | |
| | | **F1 Score (Macro)** | 0.677 | |
| | | **Precision (Weighted)** | 0.798 | |
| | | **Recall (Weighted)** | 0.647 | |
| | | **F1 Score (Weighted)** | 0.711 | |
| | | **ROC AUC (Micro)** | 0.941 | |
| | | **ROC AUC (Macro)** | 0.928 | |
| | | **PR AUC (Micro)** | 0.815 | |
| | | **PR AUC (Macro)** | 0.765 | |
| | | **Jaccard (Micro)** | 0.848 | |
| | | **Jaccard (Macro)** | 0.520 | |
| |
|
| | ### Per-Label F1 Scores |
| |
|
| | | Label | F1 Score | |
| | |----------------|----------| |
| | | News | 0.605 | |
| | | Entertainment | 0.764 | |
| | | Shop | 0.704 | |
| | | Chat | 0.875 | |
| | | Education | 0.763 | |
| | | Government | 0.667 | |
| | | Health | 0.574 | |
| | | Technology | 0.738 | |
| | | Work | 0.527 | |
| | | Travel | 0.571 | |
| | | Uncategorized | 0.657 | |
| |
|
| | --- |
| |
|
| | > **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome. |