Text Classification
Transformers
Joblib
Portuguese
streamlit
multi-label-classification
gradient-boosting
active-learning
bertimbau
municipal-documents
meeting-minutes
Instructions to use anonymous12321/Council_Topics_Classifier_PT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anonymous12321/Council_Topics_Classifier_PT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="anonymous12321/Council_Topics_Classifier_PT")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anonymous12321/Council_Topics_Classifier_PT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,7 @@ base_model:
|
|
| 25 |
|
| 26 |
**Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
|
| 27 |
|
| 28 |
-
🚀 **Try out the model:** [
|
| 29 |
|
| 30 |
## Key Features
|
| 31 |
|
|
@@ -100,18 +100,18 @@ bert_model = AutoModel.from_pretrained("neuralmind/bert-base-portuguese-cased").
|
|
| 100 |
|
| 101 |
# Preprocess text
|
| 102 |
text = "A Câmara Municipal aprovou o orçamento de 2024..."
|
| 103 |
-
# (apply smart_preprocess function - see
|
| 104 |
|
| 105 |
# Extract features
|
| 106 |
tfidf_features = tfidf.transform([text])
|
| 107 |
-
# (extract BERT embeddings - see
|
| 108 |
|
| 109 |
# Combine features and predict
|
| 110 |
X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
|
| 111 |
|
| 112 |
# Get ensemble predictions
|
| 113 |
logistic_proba = logistic_model.predict_proba(X_combined)
|
| 114 |
-
# (apply GB models and adaptive weighting - see
|
| 115 |
|
| 116 |
# Apply optimal thresholds
|
| 117 |
predictions = (ensemble_proba >= optimal_thresholds).astype(int)
|
|
@@ -125,7 +125,7 @@ print(f"Predicted Topics: {predicted_labels}")
|
|
| 125 |
|
| 126 |
The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
|
| 127 |
|
| 128 |
-
- **Documents**: 2,500+ meeting minutes subjects
|
| 129 |
- **Time Period**: 2021-2024
|
| 130 |
- **Source**: Portuguese municipalities (anonymized)
|
| 131 |
- **Labels**: 22 topic categories
|
|
|
|
| 25 |
|
| 26 |
**Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
|
| 27 |
|
| 28 |
+
🚀 **Try out the model:** [Demo Council Topics Classifier PT](https://huggingface.co/spaces/anonymous12321/Council_Topics_Classifier_PT)
|
| 29 |
|
| 30 |
## Key Features
|
| 31 |
|
|
|
|
| 100 |
|
| 101 |
# Preprocess text
|
| 102 |
text = "A Câmara Municipal aprovou o orçamento de 2024..."
|
| 103 |
+
# (apply smart_preprocess function - see demo source code)
|
| 104 |
|
| 105 |
# Extract features
|
| 106 |
tfidf_features = tfidf.transform([text])
|
| 107 |
+
# (extract BERT embeddings - see demo source code)
|
| 108 |
|
| 109 |
# Combine features and predict
|
| 110 |
X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
|
| 111 |
|
| 112 |
# Get ensemble predictions
|
| 113 |
logistic_proba = logistic_model.predict_proba(X_combined)
|
| 114 |
+
# (apply GB models and adaptive weighting - see demo source code)
|
| 115 |
|
| 116 |
# Apply optimal thresholds
|
| 117 |
predictions = (ensemble_proba >= optimal_thresholds).astype(int)
|
|
|
|
| 125 |
|
| 126 |
The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
|
| 127 |
|
| 128 |
+
- **Documents**: 2,500+ meeting minutes discussion subjects
|
| 129 |
- **Time Period**: 2021-2024
|
| 130 |
- **Source**: Portuguese municipalities (anonymized)
|
| 131 |
- **Labels**: 22 topic categories
|