Spaces:

beemabee
/

Factory_Prediction

No application file

App Files Files Community

beemabee commited on Jul 27, 2024

Commit

9fb1c2a

1 Parent(s): 91c8ed7

add required file

Browse files

Files changed (7) hide show

Data.xlsx +0 -0
README.md +84 -1
model/best_model.joblib +3 -0
model/scaler.joblib +3 -0
requirements.txt +11 -0
src/app.py +78 -0
src/training.ipynb +0 -0

Data.xlsx ADDED Viewed

Binary file (47.6 kB). View file

README.md CHANGED Viewed

@@ -9,4 +9,87 @@ app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 ---
+## Project Structure
+- `src/`: Contains the source code
+  - `training.ipynb`: Jupyter notebook for data preprocessing and model training
+  - `app.py`: Streamlit application for predictions and SHAP analysis
+- `model/`: Stores trained models and scalers
+- `data/`: Contains the dataset (not included in the repository)
+## Installation
+1. Clone this repository:
+   ```
+   git clone https://github.com/yourusername/Factory_Prediction.git
+   cd Factory_Prediction
+   ```
+2. Create a virtual environment (optional but recommended):
+   ```
+   python -m venv venv
+   source venv/bin/activate  # On Windows use `venv\Scripts\activate`
+   ```
+3. Install the required packages:
+   ```
+   pip install -r requirements.txt
+   ```
+## Data Preprocessing and Model Training
+1. Open and run the `src/training.ipynb` notebook in Jupyter or any compatible environment.
+2. The notebook covers:
+   - Data loading and cleaning
+   - Exploratory Data Analysis (EDA)
+   - Feature selection and engineering
+   - Model training (Linear Regression, Random Forest, XGBoost)
+   - Hyperparameter tuning
+   - Model evaluation
+## Running the Streamlit Application
+1. Ensure you have completed the model training step.
+2. Run the Streamlit app:
+   ```
+   streamlit run src/app.py
+   ```
+3. Open the provided URL in your web browser.
+## Using the Streamlit Application
+1. Input values for each feature (SamplingNC, SamplingChek, QTY, TimeProduce, Years).
+2. Click the "Prediksi dan Analisis" button.
+3. View the predicted NC percentage and SHAP analysis visualizations.
+## Project Methodology
+1. Data Cleaning:
+   - Handled missing values and outliers
+   - Standardized data formats
+2. Feature Selection:
+   - Used correlation analysis to identify relevant features
+   - Applied domain knowledge to select meaningful predictors
+3. Model Training:
+   - Experimented with Linear Regression, Random Forest, and XGBoost
+   - Performed hyperparameter tuning using GridSearchCV
+   - Selected the best performing model based on evaluation metrics
+4. Model Interpretation:
+   - Utilized SHAP (SHapley Additive exPlanations) for model interpretability
+   - Implemented visualizations to explain feature importance and impact
+## Technologies Used
+- Python
+- Pandas for data manipulation
+- Scikit-learn for model training and evaluation
+- XGBoost for advanced modeling
+- Streamlit for web application development
+- SHAP for model interpretation
+## Future Improvements
+- Incorporate more advanced feature engineering techniques
+- Experiment with ensemble methods for improved predictions
+- Enhance the Streamlit UI for better user experience
+## Contributors
+- Andika Atmanegara Putra
+## License
+This project is licensed under the [MIT License](LICENSE).

model/best_model.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c591d53365f52836505f3fb3bf5fce5bda1fe03e90520f440d1239c91bb6129f
+size 658382

model/scaler.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14a9cf41df424ab093205e64fac0ff81c03ef70770ab23ebf8394c06f2eef3af
+size 1087

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+numpy
+pandas
+seaborn
+matplotlib
+scikit-learn
+xgboost
+klib
+openpyxl
+streamlit
+joblib
+shap

src/app.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import streamlit as st
+import joblib
+import pandas as pd
+import shap
+import matplotlib.pyplot as plt
+import numpy as np
+from datetime import time, datetime
+# Muat model dan scaler
+@st.cache_resource
+def load_model_and_scaler():
+    model = joblib.load('../model/best_model.joblib')
+    scaler = joblib.load('../model/scaler.joblib')
+    return model, scaler
+model, scaler = load_model_and_scaler()
+# Fitur yang digunakan oleh model
+model_features = ['SamplingNC', 'SamplingChek', 'QTY', 'TimeProduce', 'Years']
+# Fungsi untuk melakukan prediksi dan SHAP analysis
+def predict_and_explain(features):
+    features_scaled = scaler.transform(features)
+    prediction = model.predict(features_scaled)
+    explainer = shap.TreeExplainer(model)
+    shap_values = explainer.shap_values(features_scaled)
+    return prediction, shap_values, explainer
+# UI Streamlit
+st.title('Prediksi NC dengan Analisis SHAP')
+# Input fields untuk fitur yang digunakan model
+input_data = {}
+for col in model_features:
+    if col == 'TimeProduce':
+        time_input = st.time_input(f"Pilih {col}", value=time(0, 0))
+        input_data[col] = time_input.hour + time_input.minute / 60.0
+    elif col == 'Years':
+        input_data[col] = st.number_input(col, min_value=2000, max_value=2100, value=2000)
+    else:
+        input_data[col] = st.number_input(col, min_value=0, value=0)
+if st.button('Prediksi dan Analisis'):
+    features = pd.DataFrame([input_data])
+    prediction, shap_values, explainer = predict_and_explain(features)
+    st.write(f'Prediksi NC %: {prediction[0]:.2f}%')
+    # Menampilkan data input yang digunakan
+    st.write("Data Input:")
+    display_features = features.copy()
+    display_features['TimeProduce'] = time(int(features['TimeProduce']), int((features['TimeProduce'] % 1) * 60)).strftime("%H:%M")
+    st.write(display_features)
+    # Visualisasi SHAP (Beeswarm plot)
+    st.write("Analisis SHAP (Pengaruh Fitur):")
+    fig, ax = plt.subplots(figsize=(10, 6))
+    shap.summary_plot(shap_values, features, plot_type="bar", show=False)
+    plt.title("Pengaruh Fitur terhadap Prediksi")
+    plt.xlabel("Rata-rata dampak pada prediksi")
+    plt.tight_layout()
+    st.pyplot(fig)
+    st.write("Interpretasi: Panjang bar menunjukkan seberapa besar pengaruh fitur terhadap prediksi. "
+             "Warna merah menunjukkan pengaruh positif (meningkatkan NC %), "
+             "sedangkan warna biru menunjukkan pengaruh negatif (menurunkan NC %).")
+    # Waterfall plot untuk feature importance
+    st.write("Kontribusi Fitur untuk Prediksi Ini:")
+    fig, ax = plt.subplots(figsize=(10, 6))
+    shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0], features.iloc[0], max_display=10, show=False)
+    plt.title("Kontribusi Setiap Fitur terhadap Prediksi")
+    plt.tight_layout()
+    st.pyplot(fig)
+    st.write("Interpretasi: Plot ini menunjukkan bagaimana setiap fitur berkontribusi terhadap prediksi akhir. "
+             "Batang merah menunjukkan peningkatan NC %, sedangkan batang biru menunjukkan penurunan NC %. "
+             "Nilai awal adalah rata-rata prediksi, dan nilai akhir adalah prediksi untuk input ini.")

src/training.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff