--- title: Semiconductor Defect Detection emoji: 🔬 colorFrom: blue colorTo: indigo sdk: docker pinned: false --- # Semiconductor Wafer Defect Detection: End-to-End AI Pipeline ## Project Overview This project is a complete, end-to-end Applied AI pipeline designed for the semiconductor manufacturing industry. It takes raw mathematical array data representing defective semiconductor wafers, engineers them into an AI-ready computer vision dataset, trains a custom YOLOv8 object detection model, and feeds the results into a predictive material waste model and real-time dashboard. **Final YOLOv8 Model Performance:** `0.962 mAP@50` (96.2% overall accuracy on unseen validation data). **Predictive Waste Model Performance:** `R² = 0.9637` (Highly accurate material waste prediction). ## Business Value In semiconductor fabrication, identifying microscopic defects early in the manufacturing process saves millions in scrapped materials. This project automates quality control by transitioning from manual coordinate analysis to real-time, AI-driven visual defect detection, while simultaneously forecasting future material waste to optimize supply chain planning. ## The Technical Pipeline ### Phase 1: Data Engineering (`src/data_prep.py`) * **The Challenge:** The original dataset consisted of raw `.txt` files containing numeric 2D arrays (0=background, 1=good chip, 2=defect). YOLOv8 cannot read text arrays; it requires physical images and normalized bounding box coordinates. * **The Solution:** Built a custom Python pipeline using `NumPy` and `OpenCV` to parse over 25,000 text files. * **The Math:** Programmatically identified the spatial extremes (`xmin`, `ymin`, `xmax`, `ymax`) of the `2` values, normalized them to YOLO's strict `0.0 - 1.0` format, and dynamically rendered high-contrast `.jpg` images alongside corresponding `.txt` label files. ### Phase 2: Dataset Architecture (`src/split_data.py`) * Used `scikit-learn` to execute a mathematically rigorous 80/20 train/validation split. * Programmatically generated the strict directory architecture required by YOLO, migrating over 50,000 individual files into structured `train` and `val` directories. ### Phase 3: Model Training (`src/model_train.py`) * Initialized a pre-trained **YOLOv8 Nano** (`yolov8n.pt`) model for lightweight, high-speed inference. * Trained on 20,415 wafer images for 10 epochs. * Mapped 8 specific manufacturing defect classes (Center, Donut, Edge-Loc, Edge-Ring, Loc, Random, Scratch, Near-full). ### Phase 4: Batch Inference & Evaluation (`src/batch_inference.py` & `src/model_eval.py`) * Deployed the custom-trained `best.pt` weights to run batch inference on unseen validation images. * Model successfully drew accurate bounding boxes and assigned confidence scores entirely autonomously. ### Phase 5: Production Middleware, Predictive Modeling & Dashboard * **Robotic Scanner Simulation (`middleware/robot_controller.py`):** Operates on a massive hybrid dataset of **823,953 wafers** (Mixed-type + WM-811K datasets) with a realistic 95.5% pass rate. It automatically routes passed wafers and runs YOLOv8 inference on defective ones, logging everything into a centralized SQLite database (`wafer_control.db`). * **Material Waste Predictor (`middleware/material_predictor.py`):** A Random Forest Regressor trained on the historical scan database. It accurately predicts the average percentage of material wasted within defective wafers, allowing fabs to estimate future material needs. * **Real-time Dashboard (`middleware/dashboard.py`):** A **Plotly Dash** web application that visualizes historical defect rates, defect distributions, routing actions, and integrates interactive material forecasting inputs. ## Upcoming Feature: LLM Troubleshooting Assistant (Planned) **Goal:** Integrate an intelligent Large Language Model (LLM) bot to assist fab engineers directly on the factory floor. * **Functionality:** When the dashboard flags a sudden spike in a specific defect type (e.g., "Edge-Ring" defects), the engineer can consult the LLM bot. * **Use Case:** The bot will analyze the defect trends, cross-reference historical manufacturing guidelines, and suggest potential root causes (such as misaligned etching tools or incorrect gas pressure), drastically reducing troubleshooting and downtime. *(Note: This feature is currently in the design phase and not yet implemented).* ## Performance Metrics The YOLOv8 model achieved phenomenal results on the blind validation set: | Metric | Score | Note | | :--- | :--- | :--- | | **mAP50 (All Classes)** | **96.2%** | Overall model accuracy at a 50% confidence threshold. | | **Recall** | **93.1%** | The model successfully located 93.1% of all physical defects. | | **Edge-Ring (mAP50)** | **99.4%** | Near-flawless detection of Edge-Ring anomalies. | The Random Forest Material Waste Predictor achieved: | Metric | Score | Note | | :--- | :--- | :--- | | **R² Score** | **0.9637** | Excellent correlation on predictive targets. | | **MAE** | **0.09%** | Average prediction error is less than one-tenth of a percent. | ## Tech Stack * **Languages:** Python * **Computer Vision:** Ultralytics (YOLOv8), OpenCV (`cv2`) * **Machine Learning & Data:** Pandas, NumPy, Scikit-learn, SQLite * **Web UI & Visualization:** Plotly, Dash ## Deployment (Docker) This application is fully containerized for easy deployment. 1. **Clone the repository:** ```bash git clone https://github.com/Udayan2001/Semiconductor_defect_detection.git cd Semiconductor_defect_detection ``` 2. **Add API Key:** Create a `.env` file in the `backend/` directory and add your Google Gemini API key: ``` GEMINI_API_KEY=your_api_key_here ``` 3. **Start the Application:** Run the following command from the root directory to build and start both the backend and frontend servers: ```bash docker compose up --build ``` 4. **Access the Dashboard:** Open your browser and navigate to `http://localhost:5173`. --- *Designed and engineered by Udayan Shashank Shukla.*