| | --- |
| | title: Hopcroft Skill Classification |
| | emoji: π§ |
| | colorFrom: blue |
| | colorTo: green |
| | sdk: docker |
| | app_port: 7860 |
| | api_docs_url: /docs |
| | --- |
| | |
| | # Hopcroft Skill Classification |
| |
|
| | [](https://github.com/se4ai2526-uniba/Hopcroft/actions/workflows/ci.yml) |
| | [](https://huggingface.co/spaces/se4ai2526-uniba/Hopcroft) |
| | [](https://dagshub.com/se4ai2526-uniba/Hopcroft.mlflow) |
| |
|
| | **Multi-label skill classification for GitHub issues and pull requests** β Automatically identify technical skills required to resolve software issues using machine learning. |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards. |
| |
|
| | ### Key Features |
| |
|
| | - π― **Multi-label Classification**: Predict multiple skills per issue |
| | - π **REST API**: FastAPI with Swagger documentation |
| | - π₯οΈ **Web Interface**: Streamlit GUI for interactive predictions |
| | - π **Monitoring**: Prometheus/Grafana dashboards with drift detection |
| | - π **CI/CD**: GitHub Actions with Docker deployment |
| | - π **Experiment Tracking**: MLflow on DagsHub |
| |
|
| | --- |
| |
|
| | ## Architecture |
| |
|
| | ```mermaid |
| | graph TB |
| | subgraph "Data Layer" |
| | A[(SkillScope DB)] --> B[Feature Engineering] |
| | B --> C[TF-IDF / Embeddings] |
| | end |
| | |
| | subgraph "ML Pipeline" |
| | C --> D[Model Training] |
| | D --> E[(MLflow Tracking)] |
| | D --> F[Random Forest Model] |
| | end |
| | |
| | subgraph "Serving Layer" |
| | F --> G[FastAPI Service] |
| | G --> H[predict endpoint] |
| | G --> I[predictions endpoint] |
| | G --> J[health endpoint] |
| | end |
| | |
| | subgraph "Frontend" |
| | G --> K[Streamlit GUI] |
| | end |
| | |
| | subgraph "Monitoring" |
| | G --> L[Prometheus] |
| | L --> M[Grafana] |
| | N[Drift Detection] --> L |
| | end |
| | |
| | subgraph "Deployment" |
| | O[GitHub Actions] --> P[Docker Build] |
| | P --> Q[HF Spaces] |
| | end |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Documentation |
| |
|
| | | Document | Description | |
| | |----------|-------------| |
| | | π [Milestone Summaries](docs/milestone_summaries.md) | All 6 project phases documented | |
| | | π [User Guide](docs/user_guide.md) | Setup, API, GUI, testing, monitoring | |
| | | ποΈ [Design Choices](docs/design_choices.md) | Technical decisions & rationale | |
| | | π― [ML Canvas](docs/ML%20Canvas.md) | Requirements engineering framework | |
| | | β
[Testing & Validation](docs/testing_and_validation.md) | QA strategy & results | |
| | | π [Model Card](models/README.md) | Model details & performance | |
| | | π [Dataset Card](data/README.md) | Dataset details & preprocessing | |
| | --- |
| |
|
| | ## Quick Start |
| |
|
| | ### Docker (Recommended) |
| |
|
| | ```bash |
| | # Clone and configure |
| | git clone https://github.com/se4ai2526-uniba/Hopcroft.git |
| | cd Hopcroft |
| | cp .env.example .env |
| | # Edit .env with your DagsHub credentials |
| | |
| | # Start services |
| | docker compose -f docker/docker-compose.yml up -d --build |
| | ``` |
| |
|
| | **Access (Local):** |
| | - π **API Docs**: http://localhost:8080/docs |
| | - π₯οΈ **GUI**: http://localhost:8501 |
| | - β€οΈ **Health**: http://localhost:8080/health |
| |
|
| | ### Local Development |
| |
|
| | ```bash |
| | # Setup environment |
| | python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows |
| | pip install -r requirements.txt && pip install -e . |
| | |
| | # Start API |
| | make api-dev |
| | |
| | # Start GUI (new terminal) |
| | streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | βββ hopcroft_skill_classification_tool_competition/ |
| | β βββ main.py # FastAPI application |
| | β βββ streamlit_app.py # Streamlit GUI |
| | β βββ features.py # Feature engineering |
| | β βββ modeling/ # Training & prediction |
| | β βββ config.py # Configuration |
| | βββ data/ # DVC-tracked datasets |
| | βββ models/ # DVC-tracked models |
| | βββ tests/ # Pytest test suites |
| | βββ monitoring/ # Prometheus, Grafana, Locust |
| | βββ docker/ # Docker configurations |
| | βββ docs/ # Documentation |
| | βββ .github/workflows/ # CI/CD pipelines |
| | ``` |
| |
|
| | --- |
| |
|
| | ## API Endpoints |
| |
|
| | | Method | Endpoint | Description | |
| | |--------|----------|-------------| |
| | | `POST` | `/predict` | Classify single issue | |
| | | `POST` | `/predict/batch` | Batch classification | |
| | | `GET` | `/predictions` | List recent predictions | |
| | | `GET` | `/predictions/{id}` | Get by MLflow run ID | |
| | | `GET` | `/health` | Health check | |
| | | `GET` | `/metrics` | Prometheus metrics | |
| |
|
| | **Example:** |
| | ```bash |
| | curl -X POST "http://localhost:8080/predict" \ |
| | -H "Content-Type: application/json" \ |
| | -d '{"issue_text": "Fix OAuth2 authentication bug"}' |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Live Deployment |
| | - **API**: https://dacrow13-hopcroft-skill-classification.hf.space/docs |
| | - **GUI**: https://dacrow13-hopcroft-skill-classification.hf.space |
| | - **MLflow**: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments |
| | - **Prometheus**: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ |
| | - **Grafana**: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ |
| | - **Betterstack**: Alerting configured. [Alert System Evidence](monitoring/screenshots) |
| |
|
| | --- |
| |
|
| | ## Development |
| |
|
| | ```bash |
| | # Run tests |
| | make test-all # All tests |
| | make test-behavioral # ML behavioral tests |
| | make validate-deepchecks # Data validation |
| | |
| | # Lint & format |
| | make lint # Check code style |
| | make format # Auto-fix issues |
| | |
| | # Training |
| | make train-baseline-tfidf # Train baseline model |
| | ``` |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | This project was developed as part of the SE4AI 2025-26 course at the University of Bari. |