| # Digit Recognition |
|
|
| ## Intended Use |
| This model is designed to classify handwritten digits (0-9) based on pixel values from the MNIST-like dataset. It is intended for educational purposes and to demonstrate the use of Random Forest for multi-class classification. |
|
|
| ## Training Data |
| - **Dataset**: The model was trained on a dataset with 42,000 samples, where each sample is a 28x28 grayscale image flattened into a vector of 784 pixel values. |
| - **Labels**: The dataset contains 10 classes (digits 0-9). |
| - **Train-Test Split**: |
| - Training set: 33,600 samples (80%) |
| - Validation set: 8,400 samples (20%) |
|
|
| ## Evaluation Metrics |
| - **Accuracy**: The model achieved an accuracy of approximately `accuracy_score(y_val, y_pred)` on the validation set. |
| - **Classification Report**: Includes precision, recall, and F1-score for each class. |
| - **Confusion Matrix**: Visualized to show the distribution of predictions across classes. |
|
|
| ## Limitations |
| - The model may not generalize well to digits written in styles significantly different from the training data. |
| - It is not optimized for real-time or large-scale applications. |
|
|
| ## Ethical Considerations |
| - Ensure the dataset used does not contain any biases that could affect the fairness of the model. |
| - The model should not be used in critical applications without further validation and testing. |
|
|
| ## How to Use |
| 1. Load the model using `joblib.load('digit_rf_model.joblib')`. |
| 2. Preprocess the input data to match the format of the training data (28x28 images flattened into 784-pixel vectors). |
| 3. Use the `predict` method to classify new samples. |