YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π Named Entity Recognition for Azerbaijani Language
A state-of-the-art Named Entity Recognition (NER) system specifically designed for the Azerbaijani language, featuring multiple fine-tuned transformer models and a production-ready FastAPI deployment with an intuitive web interface.
πΈ Preview
π Live Demo
Try the live demo: Named Entity Recognition Demo
Note: The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup.
ποΈ System Architecture
graph TD
A[User Input] --> B[FastAPI Server]
B --> C[XLM-RoBERTa Model]
C --> D[Token Classification]
D --> E[Entity Aggregation]
E --> F[Label Mapping]
F --> G[JSON Response]
G --> H[Frontend Visualization]
subgraph "Model Pipeline"
C --> C1[Tokenization]
C1 --> C2[BERT Encoding]
C2 --> C3[Classification Head]
C3 --> D
end
subgraph "Entity Categories"
I[Person]
J[Location]
K[Organization]
L[Date/Time]
M[Government]
N[25 Total Categories]
end
F --> I
F --> J
F --> K
F --> L
F --> M
F --> N
π€ Model Training Pipeline
flowchart LR
A[Azerbaijani NER Dataset] --> B[Data Preprocessing]
B --> C[Tokenization]
C --> D[Label Alignment]
subgraph "Model Training"
E[mBERT] --> F[Fine-tuning]
G[XLM-RoBERTa] --> F
H[XLM-RoBERTa Large] --> F
I[Azeri-Turkish BERT] --> F
F --> J[Model Evaluation]
end
D --> E
D --> G
D --> H
D --> I
J --> K[Best Model Selection]
K --> L[Hugging Face Hub]
L --> M[Production Deployment]
subgraph "Performance Metrics"
N[Precision: 76.44%]
O[Recall: 74.05%]
P[F1-Score: 75.22%]
end
J --> N
J --> O
J --> P
π Data Flow Architecture
sequenceDiagram
participant U as User
participant F as Frontend
participant API as FastAPI
participant M as XLM-RoBERTa
participant HF as Hugging Face
U->>F: Enter Azerbaijani text
F->>API: POST /predict/
API->>M: Process text
M->>M: Tokenize input
M->>M: Generate predictions
M->>API: Return entity predictions
API->>API: Apply label mapping
API->>API: Group entities by type
API->>F: JSON response with entities
F->>U: Display highlighted entities
Note over M,HF: Model loaded from<br/>IsmatS/xlm-roberta-az-ner
Project Structure
.
βββ Dockerfile # Docker image configuration
βββ README.md # Project documentation
βββ fly.toml # Fly.io deployment configuration
βββ main.py # FastAPI application entry point
βββ models/ # Model-related files
β βββ NER_from_scratch.ipynb # Custom NER implementation notebook
β βββ README.md # Models documentation
β βββ XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook
β βββ azeri-turkish-bert-ner.ipynb # Azeri-Turkish BERT training
β βββ mBERT.ipynb # mBERT training notebook
β βββ push_to_HF.py # Hugging Face upload script
β βββ train-00000-of-00001.parquet # Training data
β βββ xlm_roberta_large.ipynb # XLM-RoBERTa Large training
βββ requirements.txt # Python dependencies
βββ static/ # Frontend assets
β βββ app.js # Frontend logic
β βββ style.css # UI styling
βββ templates/ # HTML templates
βββ index.html # Main UI template
π§ Models & Dataset
π Available Models
| Model | Parameters | F1-Score | Hugging Face | Status |
|---|---|---|---|---|
| mBERT Azerbaijani NER | 180M | 67.70% | β | Released |
| XLM-RoBERTa Azerbaijani NER | 125M | 75.22% | β | Production |
| XLM-RoBERTa Large Azerbaijani NER | 355M | 75.48% | β | Released |
| Azerbaijani-Turkish BERT Base NER | 110M | 73.55% | β | Released |
π Supported Entity Types (25 Categories)
mindmap
root((NER Categories))
Person
Location
Organization
Government
Date
Time
Money
Percentage
Facility
Product
Event
Art
Law
Language
Position
Nationality
Disease
Contact
Quantity
Project
Cardinal
Ordinal
Proverb
Miscellaneous
Other
π Dataset Information
- Source: Azerbaijani NER Dataset
- Size: High-quality annotated Azerbaijani text corpus
- Language: Azerbaijani (az)
- Annotation: IOB2 format with 25 entity categories
- Training Infrastructure: A100 GPU on Google Colab Pro+
π Model Performance Comparison
xychart-beta
title "Model Performance Comparison (F1-Score)"
x-axis [mBERT, XLM-RoBERTa, XLM-RoBERTa-Large, Azeri-Turkish-BERT]
y-axis "F1-Score (%)" 65 --> 80
bar [67.70, 75.22, 75.48, 73.55]
π Detailed Performance Metrics
mBERT Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|
| 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 |
| 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 |
| 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 |
XLM-RoBERTa Base Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 |
| 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 |
| 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 |
| 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 |
XLM-RoBERTa Large Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 |
| 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 |
| 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 |
| 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 |
Azeri-Turkish-BERT Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 |
| 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 |
| 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 |
| 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 |
β‘ Key Features
- π― State-of-the-art Accuracy: 75.22% F1-score on Azerbaijani NER
- π 25 Entity Categories: Comprehensive coverage including Person, Location, Organization, Government, and more
- π Production Ready: Deployed on Fly.io with FastAPI backend
- π¨ Interactive UI: Real-time entity highlighting with confidence scores
- π Multiple Models: Four different transformer models to choose from
- π Confidence Scoring: Each prediction includes confidence metrics
- π Multilingual Foundation: Built on XLM-RoBERTa for cross-lingual understanding
- π± Responsive Design: Works seamlessly across desktop and mobile devices
π οΈ Technology Stack
graph LR
subgraph "Frontend"
A[HTML5] --> B[CSS3]
B --> C[JavaScript]
end
subgraph "Backend"
D[FastAPI] --> E[Python 3.8+]
E --> F[Uvicorn]
end
subgraph "ML Stack"
G[Transformers] --> H[PyTorch]
H --> I[Hugging Face]
end
subgraph "Deployment"
J[Docker] --> K[Fly.io]
K --> L[Production]
end
C --> D
F --> G
I --> J
π Setup Instructions
Local Development
- Clone the repository
git clone https://github.com/Ismat-Samadov/Named_Entity_Recognition.git
cd Named_Entity_Recognition
- Set up Python environment
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
- Run the application
uvicorn main:app --host 0.0.0.0 --port 8080
Fly.io Deployment
- Install Fly CLI
# On Unix/macOS
curl -L https://fly.io/install.sh | sh
- Configure deployment
# Login to Fly.io
fly auth login
# Initialize app
fly launch
# Configure memory (minimum 2GB recommended)
fly scale memory 2048
- Deploy application
fly deploy
# Monitor deployment
fly logs
π‘ Usage
Quick Start
Access the application:
- π Local: http://localhost:8080
- π Production: https://named-entity-recognition.fly.dev
Enter Azerbaijani text in the input field
Click "Submit" to process and view named entities
View results with entities highlighted by category and confidence scores
Example Usage
# Example API request
import requests
response = requests.post(
"https://named-entity-recognition.fly.dev/predict/",
data={"text": "2014-cΓΌ ildΙ AzΙrbaycan RespublikasΔ±nΔ±n prezidenti Δ°lham Ζliyev Salyanda olub."}
)
print(response.json())
# Output: {
# "entities": {
# "Date": ["2014"],
# "Government": ["AzΙrbaycan"],
# "Organization": ["RespublikasΔ±nΔ±n"],
# "Position": ["prezidenti"],
# "Person": ["Δ°lham Ζliyev"],
# "Location": ["Salyanda"]
# }
# }
π― Model Capabilities
- Person Names: Δ°lham Ζliyev, HeydΙr Ζliyev, Nizami GΙncΙvi
- Locations: BakΔ±, Salyanda, AzΙrbaycan, GΙncΙ
- Organizations: Respublika, Universitet, ΕirkΙt
- Dates & Times: 2014-cΓΌ il, sentyabr ayΔ±, sΙhΙr saatlarΔ±
- Government Entities: prezident, nazir, mΙclis
- And 20+ more categories...
π€ Contributing
We welcome contributions! Here's how you can help:
- π΄ Fork the repository
- πΏ Create your feature branch (
git checkout -b feature/AmazingFeature) - π Commit your changes (
git commit -m 'Add some AmazingFeature') - π€ Push to the branch (
git push origin feature/AmazingFeature) - π Open a Pull Request
Development Areas
- π§ Model improvements and fine-tuning
- π¨ UI/UX enhancements
- π Performance optimizations
- π§ͺ Additional test cases
- π Documentation improvements
π License
This project is open source and available under the MIT License.
π Acknowledgments
- Hugging Face team for the transformer models and infrastructure
- Google Colab for providing A100 GPU access
- Fly.io for hosting the production deployment
- The Azerbaijani NLP community for dataset contributions
π Repository Stats
π Related Projects
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
