Spaces:

prthm11
/

Scratch_Vision_Game

Sleeping

App Files Files Community

Scratch_Vision_Game / README2.md

prthm11

Upload README2.md

c37925f verified 6 months ago

preview code

raw

history blame contribute delete

28.4 kB

	# Scratch Vision Game - Technical Documentation

	## Overview

	The Scratch Vision Game is an AI-powered system that converts visual Scratch programming blocks from images/PDFs into functional Scratch 3.0 projects (.sb3 files). The system uses computer vision, OCR, and large language models to analyze, interpret, and reconstruct Scratch programs from visual inputs.

	## System Architecture

	### Core Components

	1. Image Processing Pipeline (`app.py`)

	- PDF extraction and image preprocessing
	- Multi-modal image enhancement using OpenCV
	- OCR text extraction with Tesseract
	- Visual similarity matching using multiple algorithms

	2. Block Recognition System (`utils/block_relation_builder.py`)

	- Scratch block catalog management
	- Pseudocode to JSON conversion
	- Block relationship building and validation
	- Project structure generation

	3. AI Processing Layer
	- LLM-based code interpretation using Groq/LLaMA
	- Multi-modal vision models for image captioning
	- Semantic understanding of Scratch programming concepts

	## Process Flow & System Tree Structure

	### Complete User Journey Tree

	```
	USER INPUT (PDF File via Web Interface)
	│
	├── 📁 /process_pdf [POST] - Flask Route Handler
	│ │
	│ ├── 🔍 PDF Validation & Security
	│ │ ├── secure_filename() - Sanitize filename
	│ │ ├── tempfile.mkdtemp() - Create temp directory
	│ │ └── pdf_file.save() - Save to temp location
	│ │
	│ ├── 📄 PDF Processing Pipeline
	│ │ │
	│ │ ├── 🎯 extract_images_from_pdf()
	│ │ │ ├── partition_pdf() - Unstructured library extraction
	│ │ │ │ ├── strategy="hi_res"
	│ │ │ │ ├── extract_image_block_types=["Image"]
	│ │ │ │ └── extract_image_block_to_payload=True
	│ │ │ │
	│ │ │ ├── 💾 Save extracted.json
	│ │ │ │ └── /outputs/EXTRACTED_JSON/{pdf_name}/extracted.json
	│ │ │ │
	│ │ │ └── 🔄 For Each Extracted Image:
	│ │ │ │
	│ │ │ ├── 🖼️ Image Processing Branch
	│ │ │ │ ├── base64.b64decode() - Decode image data
	│ │ │ │ ├── Image.open() - PIL image creation
	│ │ │ │ ├── image.save() - Save as PNG
	│ │ │ │ └── /outputs/DETECTED_IMAGE/{pdf_name}/Sprite_{i}.png
	│ │ │ │
	│ │ │ └── 🤖 AI Analysis Branch (Parallel)
	│ │ │ │
	│ │ │ ├── 📝 Description Generation
	│ │ │ │ ├── LangGraph Agent (Groq LLaMA)
	│ │ │ │ ├── Prompt: "Give a brief Captioning."
	│ │ │ │ └── response["messages"][-1].content
	│ │ │ │
	│ │ │ ├── 🏷️ Name Generation
	│ │ │ │ ├── LangGraph Agent (Groq LLaMA)
	│ │ │ │ ├── Prompt: "give a short name caption"
	│ │ │ │ └── response["messages"][-1].content
	│ │ │ │
	│ │ │ └── 📋 Metadata Assembly
	│ │ │ └── extracted_sprites.json
	│ │ │ ├── "Sprite {count}": {
	│ │ │ │ ├── "name": AI_generated_name
	│ │ │ │ ├── "base64": image_data
	│ │ │ │ ├── "file-path": pdf_directory
	│ │ │ │ └── "description": AI_description
	│ │ │ └── }
	│ │
	│ └── 🎮 Project Generation Pipeline
	│ │
	│ ├── 🔍 similarity_matching()
	│ │ │
	│ │ ├── 📊 Embedding Generation Branch
	│ │ │ │
	│ │ │ ├── 🎯 Query Processing
	│ │ │ │ ├── base64.b64decode() - Decode sprite images
	│ │ │ │ ├── tempfile.mkdtemp() - Create temp workspace
	│ │ │ │ └── Image.save() - Save temp sprite files
	│ │ │ │
	│ │ │ ├── 🧠 CLIP Embeddings
	│ │ │ │ ├── OpenCLIPEmbeddings() - Initialize embedder
	│ │ │ │ ├── clip_embd.embed_image() - Generate embeddings
	│ │ │ │ └── sprite_features = np.array()
	│ │ │ │
	│ │ │ └── 📈 Similarity Computation
	│ │ │ ├── Load: /outputs/embeddings.json
	│ │ │ ├── np.matmul(sprite_matrix, img_matrix.T)
	│ │ │ └── np.argmax(similarity, axis=1)
	│ │ │
	│ │ ├── 🎨 Asset Matching & Collection
	│ │ │ │
	│ │ │ ├── 🧙‍♂️ Sprite Assets Branch
	│ │ │ │ ├── Match: /blocks/sprites/{matched_folder}/
	│ │ │ │ ├── Load: sprite.json
	│ │ │ │ ├── Copy: All files except matched image & sprite.json
	│ │ │ │ └── Append to: project_data[]
	│ │ │ │
	│ │ │ └── 🌄 Backdrop Assets Branch (Parallel)
	│ │ │ ├── Match: /blocks/Backdrops/{matched_folder}/
	│ │ │ ├── Load: project.json
	│ │ │ ├── Copy: All files except matched image & project.json
	│ │ │ └── Extract: Stage targets → backdrop_data[]
	│ │ │
	│ │ └── 🏗️ Project Assembly
	│ │ │
	│ │ ├── 📋 JSON Structure Creation
	│ │ │ ├── final_project = {
	│ │ │ │ ├── "targets": []
	│ │ │ │ ├── "monitors": []
	│ │ │ │ ├── "extensions": []
	│ │ │ │ └── "meta": {...}
	│ │ │ └── }
	│ │ │
	│ │ ├── 🧙‍♂️ Sprite Integration
	│ │ │ └── For sprite in project_data:
	│ │ │ └── if not sprite.get("isStage"):
	│ │ │ └── final_project["targets"].append(sprite)
	│ │ │
	│ │ ├── 🌄 Stage/Backdrop Integration
	│ │ │ └── if backdrop_data:
	│ │ │ ├── Merge: all_costumes.extend()
	│ │ │ ├── Merge: sounds from first backdrop
	│ │ │ └── Create: Stage target with merged assets
	│ │ │
	│ │ └── 💾 Final Output
	│ │ ├── /outputs/project_{uuid}/project.json
	│ │ └── Return: project_json_path
	│
	├── 📤 Response Generation
	│ └── JSON Response:
	│ ├── "message": "✅ PDF processed successfully"
	│ ├── "output_json": extracted_sprites_path
	│ ├── "sprites": sprite_metadata
	│ ├── "project_output_json": final_project_path
	│ └── "test_url": download_link
	│
	└── 📥 /download_sb3/{project_id} [GET] - Download Endpoint
	├── Locate: /game_samples/{project_id}.sb3
	├── Validate: File existence
	└── send_from_directory() - Serve .sb3 file
	```

	### Parallel Processing Branches

	```
	🔄 CONCURRENT OPERATIONS DURING PDF PROCESSING:

	├── 🖼️ Image Processing Thread
	│ ├── OpenCV Enhancement Pipeline
	│ │ ├── upscale_image_cv() - 2x cubic interpolation
	│ │ ├── reduce_noise_cv() - Non-local means denoising
	│ │ ├── sharpen_cv() - Kernel-based sharpening
	│ │ └── enhance_contrast_cv() - Contrast enhancement
	│ │
	│ └── Multi-Algorithm Similarity Matching
	│ ├── DINOv2 Embeddings (Semantic)
	│ ├── PHash (Perceptual Hashing)
	│ └── Image Signatures (Goldberg Algorithm)

	├── 🤖 AI Processing Thread
	│ ├── SmolVLM Vision Model
	│ │ ├── Image Captioning
	│ │ └── Name Generation
	│ │
	│ └── Groq LLaMA Language Model
	│ ├── OCR Text Refinement
	│ ├── Pseudocode Generation
	│ └── JSON Structure Validation

	└── 💾 I/O Operations Thread
	├── File System Operations
	│ ├── Directory Creation
	│ ├── Image Saving/Loading
	│ └── JSON Serialization
	│
	└── Asset Management
	├── Reference Asset Loading
	├── Project Asset Copying
	└── Final Project Assembly
	```

	### Data Flow Diagram

	```
	📊 DATA TRANSFORMATION PIPELINE:

	PDF Bytes → Images → Enhanced Images → Embeddings → Similarities → Assets → .sb3
	↓ ↓ ↓ ↓ ↓ ↓ ↓
	[Binary] [PIL.Image] [np.ndarray] [np.float32] [indices] [JSON] [ZIP]
	│ │ │ │ │ │ │
	├─ OCR ─────┼─ AI ───────┼─ Models ────┼─ Search ───┼─ Match ──┼─ Build┤
	│ │ │ │ │ │ │
	└─ Text ────┴─ Metadata ─┴─ Features ──┴─ Ranking ──┴─ Select ─┴─ Pack ┘
	```

	### Key Processing Functions

	Input Processing:

	- `extract_images_from_pdf()` - Extracts images from PDF using unstructured library
	- `process_image_cv2_from_pil()` - Enhances images using OpenCV (upscaling, denoising, sharpening)

	### 2. Visual Similarity Matching

	```
	Query Image → Multi-Algorithm Matching → Asset Selection → Project Assembly
	```

	Algorithms Used:

	- DINOv2 Embeddings: Deep learning-based semantic similarity
	- Perceptual Hashing (PHash): Structural image comparison
	- Image Signatures: Goldberg algorithm for visual fingerprinting

	Implementation:

	```python
	def run_query_search_flow(query_b64, embeddings_dict, hash_dict, signature_obj_map):
	# 1. Preprocess query image
	enhanced_query_pil = process_image_cv2_from_pil(query_from_b64, scale=2)

	# 2. Generate embeddings
	query_emb = get_dinov2_embedding_from_pil(prepped)
	query_phash = phash.encode_image(image_array=query_hash_arr)
	query_sig = gis.generate_signature(query_sig_path)

	# 3. Compute similarities
	emb_sim = cosine_similarity(query_emb, stored_emb)
	ph_sim = 1.0 - (hamming_distance / MAX_PHASH_BITS)
	im_sim = 1.0 - gis.normalized_distance(stored_sig, query_sig)

	# 4. Combine scores
	combined = (emb_clamped + ph_sim + im_sim) / 3.0
	```

	### 3. Code Block Recognition

	```
	OCR Text → LLM Processing → Pseudocode → Block Mapping → JSON Generation
	```

	LLM System Prompt:

	```python
	SYSTEM_PROMPT = """Your task is to process OCR-extracted text from images of Scratch 3.0 code blocks and produce precisely formatted pseudocode JSON.

	### Core Role
	- Treat this as an OCR refinement task: the input may contain typos or spacing issues.
	- Intelligently correct OCR mistakes to align with valid Scratch 3.0 block syntax.

	### Universal Rules
	1. Code Detection: If no Scratch blocks are detected, the `pseudocode` value must be "No Code-blocks".
	2. Script Ownership: Determine the target from "Script for:". If it matches a `Stage_costumes` name, set `name_variable` to "Stage".
	3. Pseudocode Structure: The pseudocode must be a single JSON string with `\n` for newlines.
	"""
	```

	### 4. Project Generation

	```
	Pseudocode → Block Definitions → Relationship Building → .sb3 Assembly
	```

	## Libraries and Dependencies

	### Core Libraries

	#### Computer Vision & Image Processing

	- OpenCV (`cv2`): Image enhancement, filtering, and preprocessing
	- PIL/Pillow: Image manipulation and format conversion
	- imagededup: Perceptual hashing for duplicate detection
	- image-match: Visual similarity using Goldberg signatures

	#### Machine Learning & AI

	- transformers: Hugging Face models (DINOv2, SmolVLM)
	- torch: PyTorch for deep learning inference
	- sentence-transformers: Text and image embeddings
	- faiss-cpu: Fast similarity search and clustering
	- open_clip_torch: OpenAI CLIP embeddings

	#### Language Models

	- langchain: LLM orchestration and chaining
	- langchain-groq: Groq API integration
	- langgraph: Graph-based agent workflows

	#### Document Processing

	- unstructured: PDF parsing and content extraction
	- pdf2image: PDF to image conversion
	- pytesseract: OCR text extraction
	- PyPDF2: PDF manipulation

	#### Web Framework

	- Flask: Web application framework
	- Flask-SocketIO: Real-time communication
	- gunicorn: WSGI HTTP server

	### Model Specifications

	#### Vision Models

	```python
	# DINOv2 for semantic image understanding
	DINOV2_MODEL = "facebook/dinov2-small"
	dinov2_processor = AutoImageProcessor.from_pretrained(DINOV2_MODEL)
	dinov2_model = AutoModel.from_pretrained(DINOV2_MODEL)

	# SmolVLM for image captioning
	smolvlm256m_processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
	smolvlm256m_model = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
	```

	#### Language Model

	```python
	# Groq LLaMA for code interpretation
	llm = ChatGroq(
	model="meta-llama/llama-4-scout-17b-16e-instruct",
	temperature=0,
	max_tokens=None,
	)
	```

	## Technical Approaches

	### 1. Multi-Modal Image Enhancement

	OpenCV Pipeline:

	```python
	def process_image_cv2_from_pil(pil_img, scale=2):
	bgr = pil_to_bgr_np(pil_img)
	bgr = upscale_image_cv(bgr, scale=scale) # Cubic interpolation
	bgr = reduce_noise_cv(bgr) # Non-local means denoising
	bgr = sharpen_cv(bgr) # Kernel-based sharpening
	bgr = enhance_contrast_cv(bgr) # Contrast enhancement
	return bgr_np_to_pil(bgr)
	```

	### 2. Hybrid Similarity Scoring

	Multi-Algorithm Consensus:

	```python
	def choose_top_candidates(embedding_results, phash_results, imgmatch_results):
	# Method A: Normalized weighted average
	weighted_scores[p] = (w_emb * emb_norm[p] + w_ph * ph_norm[p] + w_im * im_norm[p])

	# Method B: Rank-sum (Borda count)
	rank_sum[p] = rank_emb[p] + rank_ph[p] + rank_im[p]

	# Method C: Harmonic mean (penalizes missing values)
	harm = 3.0 / ((1.0/a) + (1.0/b) + (1.0/c))
	```

	### 3. Block Relationship Building

	Scratch Block Catalog System:

	```python
	def generate_blocks_from_opcodes(opcode_counts, all_block_definitions):
	"""
	Generates Scratch blocks with proper parent-child relationships
	- Hat blocks: topLevel=True, parent=None
	- Stack blocks: Linked via 'next' field
	- C-blocks: Contains SUBSTACK inputs
	- Shadow blocks: Linked as input values
	"""
	```

	### 4. Project Assembly Pipeline

	JSON Structure Generation:

	```python
	final_project = {
	"targets": [], # Sprites and Stage
	"monitors": [], # Variable/list monitors
	"extensions": [], # Scratch extensions
	"meta": {
	"semver": "3.0.0",
	"vm": "11.3.0",
	"agent": "OpenAI ScratchVision Agent"
	}
	}
	```

	## File System Architecture

	### Project Directory Structure

	```
	📁 scratch-vision-game/
	├── 🐍 app.py # Main Flask application (PRIMARY)
	├── 📋 requirements.txt # Python dependencies
	├── 🐳 Dockerfile # Container configuration
	├── 📖 README.md # Basic project info
	├── 📖 README2.md # Technical documentation
	│
	├── 📁 utils/ # Core processing utilities
	│ └── 🔧 block_relation_builder.py # Scratch block logic & JSON generation
	│
	├── 📁 blocks/ # Scratch block definitions & assets
	│ ├── 📊 blocks.json # Main block catalog
	│ ├── 📊 boolean_blocks.json # Boolean/condition blocks
	│ ├── 📊 cap_blocks.json # Terminal blocks (stop, delete clone)
	│ ├── 📊 c_blocks.json # Control flow blocks (if, repeat, forever)
	│ ├── 📊 control_blocks.json # Control category blocks
	│ ├── 📊 data_blocks.json # Variables and lists blocks
	│ ├── 📊 event_blocks.json # Event/trigger blocks
	│ ├── 📊 hat_blocks.json # Script starter blocks
	│ ├── 📊 looks_blocks.json # Appearance blocks
	│ ├── 📊 motion_blocks.json # Movement blocks
	│ ├── 📊 operator_blocks.json # Math and logic operators
	│ ├── 📊 reporter_blocks.json # Value reporter blocks
	│ ├── 📊 sensing_blocks.json # Sensor blocks
	│ ├── 📊 sound_blocks.json # Audio blocks
	│ ├── 📊 stack_blocks.json # Sequential action blocks
	│ │
	│ ├── 📁 sprites/ # Reference sprite assets
	│ │ ├── 📁 {sprite_name}/
	│ │ │ ├── 🖼️ {sprite_image}.png
	│ │ │ ├── 📊 sprite.json # Sprite definition
	│ │ │ └── 🎵 {sounds}.wav
	│ │ └── ...
	│ │
	│ ├── 📁 Backdrops/ # Reference backdrop assets
	│ │ ├── 📁 {backdrop_name}/
	│ │ │ ├── 🖼️ {backdrop_image}.png
	│ │ │ ├── 📊 project.json # Stage definition
	│ │ │ └── 🎵 {sounds}.wav
	│ │ └── ...
	│ │
	│ └── 📁 sound/ # Audio assets library
	│ └── 🎵 *.wav
	│
	├── 📁 templates/ # Flask HTML templates
	│ └── 🌐 *.html
	│
	├── 📁 static/ # Web static assets
	│ ├── 🎨 css/
	│ ├── 📜 js/
	│ └── 🖼️ images/
	│
	├── 📁 game_samples/ # Pre-built .sb3 files
	│ └── 🎮 *.sb3
	│
	├── 📁 generated_projects/ # Runtime generated projects
	│ └── 📁 project_{uuid}/
	│ ├── 📊 project.json
	│ ├── 🖼️ *.png
	│ └── 🎵 *.wav
	│
	└── 📁 outputs/ # Processing outputs (Runtime)
	├── 📁 DETECTED_IMAGE/ # Extracted & processed images
	│ └── 📁 {pdf_name}/
	│ └── 🖼️ Sprite_*.png
	│
	├── 📁 SCANNED_IMAGE/ # Original scanned images
	│
	├── 📁 EXTRACTED_JSON/ # Intermediate JSON data
	│ └── 📁 {pdf_name}/
	│ ├── 📊 extracted.json # Raw PDF extraction
	│ └── 📊 extracted_sprites.json # AI-processed sprites
	│
	└── 📊 embeddings.json # Pre-computed embeddings cache
	```

	### Runtime Directory Creation Flow

	```
	🏗️ DYNAMIC DIRECTORY CREATION:

	User Upload → PDF Processing → Directory Structure
	│ │ │
	├─ temp_dir ───┼─ pdf_filename ─────┼─ /outputs/DETECTED_IMAGE/{pdf_name}/
	│ │ ├─ /outputs/EXTRACTED_JSON/{pdf_name}/
	│ │ └─ /generated_projects/project_{uuid}/
	│ │
	└─ secure_filename() ──────────────────→ Sanitized paths
	```

	### Data Persistence Locations

	```
	💾 PERSISTENT DATA STORAGE:

	├── 🔄 Input Processing
	│ ├── /tmp/{random}/ - Temporary PDF storage
	│ ├── /outputs/DETECTED_IMAGE/ - Extracted sprite images
	│ ├── /outputs/EXTRACTED_JSON/ - Processing metadata
	│ └── /outputs/embeddings.json - Similarity search cache
	│
	├── 🎯 Asset Matching
	│ ├── /blocks/sprites/ - Reference sprite library
	│ ├── /blocks/Backdrops/ - Reference backdrop library
	│ └── /blocks/*.json - Block definition catalogs
	│
	└── 🎮 Final Output
	├── /generated_projects/project_{uuid}/ - Assembled project
	├── /game_samples/{project_id}.sb3 - Downloadable Scratch file
	└── /logs/app.log - Application logs
	```

	## API Endpoints

	### `/process_pdf` (POST)

	Processes uploaded PDF files containing Scratch code blocks.

	Request:

	```
	Content-Type: multipart/form-data
	pdf_file: <PDF file>
	```

	Response:

	```json
	{
	"message": "✅ PDF processed successfully",
	"output_json": "path/to/extracted.json",
	"sprites": {...},
	"project_output_json": "path/to/project.json"
	}
	```

	### `/download_sb3/<project_id>` (GET)

	Downloads generated Scratch 3.0 project files.

	## Processing Timeline & Performance

	### Execution Timeline Tree

	```
	⏱️ PROCESSING TIMELINE (Typical PDF with 5 images):

	📤 User Upload (0.0s)
	│
	├── 🔍 PDF Validation (0.1s)
	│ └── File security & temp storage
	│
	├── 📄 PDF Extraction (2-5s)
	│ ├── partition_pdf() - Unstructured processing
	│ ├── Image extraction & base64 encoding
	│ └── extracted.json creation
	│
	├── 🤖 AI Processing (10-15s per image)
	│ ├── 📝 Description Generation (5-7s)
	│ │ ├── LangGraph agent initialization
	│ │ ├── Groq API call
	│ │ └── Response processing
	│ │
	│ ├── 🏷️ Name Generation (5-7s)
	│ │ ├── Second LangGraph agent call
	│ │ ├── Groq API call
	│ │ └── Response processing
	│ │
	│ └── 📋 Metadata Assembly (0.1s)
	│ └── JSON structure creation
	│
	├── 🔍 Similarity Matching (3-8s)
	│ ├── 🎯 Image Decoding (0.5s)
	│ ├── 🧠 CLIP Embeddings (2-3s)
	│ ├── 📈 Similarity Computation (0.5s)
	│ └── 🎨 Asset Matching (2-4s)
	│
	├── 🏗️ Project Assembly (1-2s)
	│ ├── JSON merging
	│ ├── Asset copying
	│ └── Final project creation
	│
	└── 📤 Response Generation (0.1s)
	└── JSON response formatting

	TOTAL: ~60-90 seconds for 5-image PDF
	```

	### Performance Bottlenecks & Optimizations

	```
	🚀 PERFORMANCE OPTIMIZATION STRATEGIES:

	├── 🧠 Model Loading (Startup Cost)
	│ ├── ✅ Pre-loaded global models
	│ │ ├── DINOv2: ~2GB VRAM
	│ │ ├── SmolVLM: ~1GB VRAM
	│ │ └── CLIP: ~500MB VRAM
	│ │
	│ ├── ✅ GPU Acceleration (when available)
	│ │ └── torch.device("cuda" if torch.cuda.is_available() else "cpu")
	│ │
	│ └── ✅ CPU Optimization
	│ └── torch.set_num_threads(4)
	│
	├── 🖼️ Image Processing Pipeline
	│ ├── ✅ Efficient NumPy Operations
	│ │ ├── Vectorized computations
	│ │ ├── In-place operations where possible
	│ │ └── Memory-mapped file access
	│ │
	│ ├── ✅ OpenCV Optimizations
	│ │ ├── Multi-threaded operations
	│ │ ├── SIMD instructions
	│ │ └── Optimized algorithms
	│ │
	│ └── ✅ Memory Management
	│ ├── Garbage collection hints
	│ ├── Temporary file cleanup
	│ └── Buffer reuse
	│
	├── 🔍 Similarity Search Acceleration
	│ ├── ✅ Pre-computed Embeddings Cache
	│ │ └── /outputs/embeddings.json (persistent)
	│ │
	│ ├── ✅ Normalized Embeddings
	│ │ ├── Cosine similarity via dot product
	│ │ └── L2 normalization preprocessing
	│ │
	│ └── ✅ Parallel Algorithm Execution
	│ ├── DINOv2, PHash, ImageMatch concurrent
	│ └── Multi-threaded similarity computation
	│
	└── 🌐 API & I/O Optimizations
	├── ✅ Async File Operations
	├── ✅ Streaming Responses
	├── ✅ Connection Pooling
	└── ✅ Compression (gzip)
	```

	### Memory Usage Profile

	```
	💾 MEMORY CONSUMPTION BREAKDOWN:

	├── 🧠 AI Models (Peak: ~4GB)
	│ ├── DINOv2 Model: ~2GB
	│ ├── SmolVLM Model: ~1GB
	│ ├── CLIP Embeddings: ~500MB
	│ └── Groq API Client: ~100MB
	│
	├── 🖼️ Image Processing (Peak: ~500MB per image)
	│ ├── Original PIL Images: ~50MB each
	│ ├── Enhanced Images: ~100MB each
	│ ├── OpenCV Buffers: ~200MB each
	│ └── Embedding Vectors: ~2KB each
	│
	├── 📊 Data Structures (Peak: ~200MB)
	│ ├── Block Definitions: ~50MB
	│ ├── Asset Metadata: ~100MB
	│ ├── Similarity Matrices: ~50MB
	│ └── JSON Structures: ~10MB
	│
	└── 🌐 Web Framework (Baseline: ~100MB)
	├── Flask Application: ~50MB
	├── Request Buffers: ~30MB
	└── Response Caching: ~20MB

	TOTAL PEAK: ~5GB (with GPU models loaded)
	TOTAL BASELINE: ~1GB (CPU-only, no active processing)
	```

	### Performance Optimizations

	### 1. Model Caching

	- Pre-loaded models with global variables
	- GPU acceleration when available
	- Batch processing for multiple images

	### 2. Image Processing

	- Efficient numpy operations
	- OpenCV optimizations
	- Memory management for large images

	### 3. Similarity Search

	- FAISS indexing for fast nearest neighbor search
	- Normalized embeddings for cosine similarity
	- Parallel processing of multiple algorithms

	## Error Handling

	### 1. Graceful Degradation

	```python
	def process_image_cv2_from_pil(pil_img, scale=2):
	try:
	# OpenCV enhancement pipeline
	return enhanced_image
	except Exception as e:
	print(f"Enhancement failed: {e}")
	return original_image # Fallback to original
	```

	### 2. JSON Validation

	```python
	agent_json_resolver = create_react_agent(
	model=llm,
	prompt=SYSTEM_PROMPT_JSON_CORRECTOR
	)
	```

	## Deployment

	### Docker Configuration

	```dockerfile
	FROM python:3.11-slim
	# System dependencies: tesseract-ocr, poppler-utils, libgl1
	# Python dependencies: requirements.txt
	# Environment: Flask production mode
	EXPOSE 7860
	CMD ["python", "app.py"]
	```

	### Environment Variables

	- `GROQ_API_KEY`: API key for Groq language model
	- `TRANSFORMERS_CACHE`: Model cache directory
	- `HF_HOME`: Hugging Face cache directory

	## Future Enhancements

	1. Real-time Processing: WebSocket integration for live feedback
	2. Advanced OCR: Custom trained models for Scratch block recognition
	3. Multi-language Support: International Scratch block recognition
	4. Collaborative Features: Multi-user project editing
	5. Performance Monitoring: Detailed analytics and optimization metrics

	## Contributing

	The system is designed with modularity in mind:

	- Add new block definitions in `blocks/` directory
	- Extend similarity algorithms in the matching pipeline
	- Enhance OCR accuracy with custom preprocessing
	- Improve LLM prompts for better code interpretation

	## License

	Apache 2.0 License - See project repository for full details.