Upload LOCUS-Topic weights, tokenizer, and model card

3dc34e0 verified 12 days ago

1.96 kB

	---
	base_model: answerdotai/ModernBERT-base
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- text-classification
	- legal
	- locus
	- modernbert
	license: apache-2.0
	datasets:
	- LocalLaws/LOCUS-v1.0
	---

	# LocalLaws/LOCUS-Topic

	A ModernBERT classifier for the Topic axis of the LOCUS
	(Local Ordinances Corpus, United States) dataset.

	Fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on
	[LocalLaws/LOCUS-v1.0](https://huggingface.co/datasets/LocalLaws/LOCUS-v1.0).

	## Labels

	- `Buildings`
	- `Business`
	- `Nuisance`
	- `Other`
	- `Zoning`

	## Training

	\| \| \|
	\|---\|---\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Max length \| 1024 \|
	\| Classifier pooling \| `mean` \|
	\| Train / val / test \| 45183 / 5848 / 5928 \|

	## Evaluation

	\| \| \|
	\|---\|---\|
	\| Metric \| macro-F1 \|
	\| Validation macro-F1 \| 0.8127 \|
	\| Test macro-F1 \| 0.8173 \|
	\| Test accuracy \| 0.8190 \|

	```
	precision recall f1-score support

	Buildings 0.7438 0.8506 0.7936 877
	Business 0.8273 0.8381 0.8326 846
	Nuisance 0.7617 0.8419 0.7998 930
	Other 0.8916 0.7657 0.8239 2083
	Zoning 0.8169 0.8574 0.8367 1192

	accuracy 0.8190 5928
	macro avg 0.8083 0.8307 0.8173 5928
	weighted avg 0.8251 0.8190 0.8194 5928

	```

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Topic")
	model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Topic")
	model.eval()

	text = "No person shall keep any swine within the city limits."
	enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
	with torch.no_grad():
	logits = model(**enc).logits
	pred = logits.argmax(-1).item()
	print(model.config.id2label[pred])
	```