Instructions to use Proooof/Finance_NLP_Toolkit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Proooof/Finance_NLP_Toolkit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Proooof/Finance_NLP_Toolkit")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Proooof/Finance_NLP_Toolkit", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - finance | |
| - nlp | |
| - sentiment-analysis | |
| - token-classification | |
| - ner | |
| - transformers | |
| pipeline_tag: text-classification | |
| task_categories: | |
| - text-classification | |
| - token-classification | |
| # πΉ Finance NLP Toolkit | |
| **Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers. | |
| It supports two core tasks: | |
| 1) **Sentiment Analysis** β positive / neutral / negative market tone | |
| 2) **Named Entity Recognition (NER)** β companies, tickers, money, dates, etc. | |
| This repository includes: | |
| - Ready-to-run **inference snippets** | |
| - **Training scripts** for fine-tuning on your datasets | |
| - Label mapping examples and utilities | |
| > **Note:** Initial release ships training + inference scaffolding. | |
| > Plug in your dataset and fine-tune, or point to an existing finance model. | |
| --- | |
| ## π Quickstart (inference) | |
| Install deps: | |
| ```bash | |
| pip install -r requirements.txt | |
| Sentiment: | |
| from transformers import pipeline | |
| sentiment = pipeline( | |
| "sentiment-analysis", | |
| model="Proooof/Finance-NLP-Toolkit", # after you push your fine-tuned weights | |
| tokenizer="Proooof/Finance-NLP-Toolkit" | |
| ) | |
| print(sentiment("The company reported record profits and raised guidance.")) | |
| NER: | |
| from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline | |
| tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") | |
| ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") | |
| ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple") | |
| print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025.")) | |
| Tip: Use branches to host multiple checkpoints in one repo: | |
| main β sentiment | |
| ner β NER model | |
| Push each set of weights to its respective branch. | |
| π§ Training | |
| Sentiment (3-class) | |
| python training/train_sentiment.py \ | |
| --model_name distilbert-base-uncased \ | |
| --train_csv /path/train.csv \ | |
| --eval_csv /path/valid.csv \ | |
| --text_col text --label_col label \ | |
| --output_dir ./outputs/sentiment \ | |
| --epochs 3 --batch_size 16 --lr 5e-5 | |
| NER (BIO tags) | |
| python training/train_ner.py \ | |
| --model_name bert-base-cased \ | |
| --train_json /path/train.jsonl \ | |
| --eval_json /path/valid.jsonl \ | |
| --text_col tokens --label_col ner_tags \ | |
| --labels_file training/labels_ner.json \ | |
| --output_dir ./outputs/ner \ | |
| --epochs 5 --batch_size 8 --lr 3e-5 | |
| After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER). | |
| π Expected outputs | |
| Sentiment: | |
| [{'label': 'POSITIVE', 'score': 0.98}] | |
| NER: | |
| [ | |
| {'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99}, | |
| {'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99}, | |
| {'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98} | |
| ] | |
| β οΈ Limitations | |
| English focus; domain shift may reduce accuracy | |
| Sarcasm/idioms can confound sentiment | |
| NER needs domain labels for best performance | |
| π License | |
| Apache-2.0 | |