Proooof
/

Finance_NLP_Toolkit

Text Classification

sentiment-analysis

token-classification

Model card Files Files and versions

Finance_NLP_Toolkit / README.md

Proooof's picture

Update README.md

20f38f8 verified 8 months ago

|

history blame contribute delete

3.06 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	tags:
	- finance
	- nlp
	- sentiment-analysis
	- token-classification
	- ner
	- transformers
	pipeline_tag: text-classification
	task_categories:
	- text-classification
	- token-classification
	---

	# 💹 Finance NLP Toolkit

	Finance NLP Toolkit is a practical starter pack for analyzing financial text with Transformers.
	It supports two core tasks:

	1) Sentiment Analysis — positive / neutral / negative market tone
	2) Named Entity Recognition (NER) — companies, tickers, money, dates, etc.

	This repository includes:
	- Ready-to-run inference snippets
	- Training scripts for fine-tuning on your datasets
	- Label mapping examples and utilities

	> Note: Initial release ships training + inference scaffolding.
	> Plug in your dataset and fine-tune, or point to an existing finance model.

	---

	## 🚀 Quickstart (inference)

	Install deps:
	```bash
	pip install -r requirements.txt

	Sentiment:

	from transformers import pipeline
	sentiment = pipeline(
	"sentiment-analysis",
	model="Proooof/Finance-NLP-Toolkit", # after you push your fine-tuned weights
	tokenizer="Proooof/Finance-NLP-Toolkit"
	)
	print(sentiment("The company reported record profits and raised guidance."))

	NER:

	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
	tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
	ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
	ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple")
	print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025."))

	Tip: Use branches to host multiple checkpoints in one repo:

	main → sentiment

	ner → NER model
	Push each set of weights to its respective branch.

	🧠 Training
	Sentiment (3-class)
	python training/train_sentiment.py \
	--model_name distilbert-base-uncased \
	--train_csv /path/train.csv \
	--eval_csv /path/valid.csv \
	--text_col text --label_col label \
	--output_dir ./outputs/sentiment \
	--epochs 3 --batch_size 16 --lr 5e-5

	NER (BIO tags)
	python training/train_ner.py \
	--model_name bert-base-cased \
	--train_json /path/train.jsonl \
	--eval_json /path/valid.jsonl \
	--text_col tokens --label_col ner_tags \
	--labels_file training/labels_ner.json \
	--output_dir ./outputs/ner \
	--epochs 5 --batch_size 8 --lr 3e-5


	After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER).

	📊 Expected outputs

	Sentiment:

	[{'label': 'POSITIVE', 'score': 0.98}]


	NER:

	[
	{'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99},
	{'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99},
	{'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98}
	]

	⚠️ Limitations

	English focus; domain shift may reduce accuracy

	Sarcasm/idioms can confound sentiment

	NER needs domain labels for best performance

	📜 License

	Apache-2.0