Update README.md

5273865 verified 12 months ago

2.62 kB

	---
	license: cc-by-4.0
	tags:
	- multi-label-classification
	- text-classification
	- onnx
	- web-classification
	- firefox-ai
	- preview
	language:
	- multilingual
	datasets:
	- tshasan/multi-label-web-classification
	base_model: Alibaba-NLP/gte-modernbert-base
	pipeline_tag: text-classification
	library_name: transformers
	---

	# URL-TITLE-classifier-preview

	## Model Overview

	This is a preview version of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles.

	The model supports 11 labels:
	`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.

	- Developed by: Taimur Hasan
	- Model Type: Multi-label Text Classification
	- Status: Preview (under active development)

	### Architecture

	- Fine-tuning Strategy: Unfroze the last 4 encoder layers and the pooler
	- Problem Type: Multi-label classification
	- Output Labels:
	- `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized`
	- Input Format: Concatenated string:
	`"{url}:{title}"`

	---

	## Evaluation Metrics (Validation Data)

	\| Metric \| Value \|
	\|-----------------------\|--------\|
	\| Loss \| 0.207 \|
	\| Hamming Loss \| 0.083 \|
	\| Exact Match \| 0.445 \|
	\| Precision (Micro) \| 0.917 \|
	\| Recall (Micro) \| 0.917 \|
	\| F1 Score (Micro) \| 0.917 \|
	\| Precision (Macro) \| 0.795 \|
	\| Recall (Macro) \| 0.598 \|
	\| F1 Score (Macro) \| 0.677 \|
	\| Precision (Weighted) \| 0.798 \|
	\| Recall (Weighted) \| 0.647 \|
	\| F1 Score (Weighted) \| 0.711 \|
	\| ROC AUC (Micro) \| 0.941 \|
	\| ROC AUC (Macro) \| 0.928 \|
	\| PR AUC (Micro) \| 0.815 \|
	\| PR AUC (Macro) \| 0.765 \|
	\| Jaccard (Micro) \| 0.848 \|
	\| Jaccard (Macro) \| 0.520 \|

	### Per-Label F1 Scores

	\| Label \| F1 Score \|
	\|----------------\|----------\|
	\| News \| 0.605 \|
	\| Entertainment \| 0.764 \|
	\| Shop \| 0.704 \|
	\| Chat \| 0.875 \|
	\| Education \| 0.763 \|
	\| Government \| 0.667 \|
	\| Health \| 0.574 \|
	\| Technology \| 0.738 \|
	\| Work \| 0.527 \|
	\| Travel \| 0.571 \|
	\| Uncategorized \| 0.657 \|

	---

	> Note: This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.

	---
	license: cc-by-4.0
	tags:
	- multi-label-classification
	- text-classification
	- onnx
	- web-classification
	- firefox-ai
	- preview
	language:
	- multilingual
	datasets:
	- tshasan/multi-label-web-classification
	base_model: Alibaba-NLP/gte-modernbert-base
	pipeline_tag: text-classification
	library_name: transformers
	---

	# URL-TITLE-classifier-preview

	## Model Overview

	This is a preview version of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles.

	The model supports 11 labels:
	`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.

	- Developed by: Taimur Hasan
	- Model Type: Multi-label Text Classification
	- Status: Preview (under active development)

	### Architecture

	- Fine-tuning Strategy: Unfroze the last 4 encoder layers and the pooler
	- Problem Type: Multi-label classification
	- Output Labels:
	- `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized`
	- Input Format: Concatenated string:
	`"{url}:{title}"`

	---

	## Evaluation Metrics (Validation Data)

	\| Metric \| Value \|
	\|-----------------------\|--------\|
	\| Loss \| 0.207 \|
	\| Hamming Loss \| 0.083 \|
	\| Exact Match \| 0.445 \|
	\| Precision (Micro) \| 0.917 \|
	\| Recall (Micro) \| 0.917 \|
	\| F1 Score (Micro) \| 0.917 \|
	\| Precision (Macro) \| 0.795 \|
	\| Recall (Macro) \| 0.598 \|
	\| F1 Score (Macro) \| 0.677 \|
	\| Precision (Weighted) \| 0.798 \|
	\| Recall (Weighted) \| 0.647 \|
	\| F1 Score (Weighted) \| 0.711 \|
	\| ROC AUC (Micro) \| 0.941 \|
	\| ROC AUC (Macro) \| 0.928 \|
	\| PR AUC (Micro) \| 0.815 \|
	\| PR AUC (Macro) \| 0.765 \|
	\| Jaccard (Micro) \| 0.848 \|
	\| Jaccard (Macro) \| 0.520 \|

	### Per-Label F1 Scores

	\| Label \| F1 Score \|
	\|----------------\|----------\|
	\| News \| 0.605 \|
	\| Entertainment \| 0.764 \|
	\| Shop \| 0.704 \|
	\| Chat \| 0.875 \|
	\| Education \| 0.763 \|
	\| Government \| 0.667 \|
	\| Health \| 0.574 \|
	\| Technology \| 0.738 \|
	\| Work \| 0.527 \|
	\| Travel \| 0.571 \|
	\| Uncategorized \| 0.657 \|

	---

	> Note: This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.