Spaces:

WebashalarForML
/

scratch_chat

Runtime error

App Files Files Community

scratch_chat / docs /DEVELOPER_GUIDE.md

WebashalarForML

Upload 178 files

330b6e4 verified 5 months ago

preview code

raw

history blame contribute delete

21.5 kB

	# Multi-Language Chat Agent - Developer Guide

	## Architecture Overview

	The Multi-Language Chat Agent is built using a modular architecture with the following key components:

	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Frontend │ │ WebSocket │ │ Chat Agent │
	│ (HTML/JS) │◄──►│ Handler │◄──►│ Service │
	└─────────────────┘ └─────────────────┘ └─────────────────┘
	│ │
	▼ ▼
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Session │ │ Language │ │ Groq LLM │
	│ Manager │ │ Context │ │ Client │
	└─────────────────┘ └─────────────────┘ └─────────────────┘
	│
	▼
	┌─────────────────┐
	│ Chat History │
	│ Manager │
	└─────────────────┘
	│
	┌─────────────────┐ ┌─────────────────┐
	│ Redis Cache │ │ PostgreSQL │
	│ │ │ Database │
	└─────────────────┘ └─────────────────┘
	```

	## Core Components

	### 1. Chat Agent Service (`chat_agent/services/chat_agent.py`)

	The main orchestrator that coordinates all chat operations.

	Key Methods:
	- `process_message()`: Main message processing pipeline
	- `switch_language()`: Handle language context switching
	- `stream_response()`: Real-time response streaming

	Usage Example:
	```python
	from chat_agent.services.chat_agent import ChatAgent

	# Initialize chat agent
	chat_agent = ChatAgent()

	# Process a message
	response = chat_agent.process_message(
	session_id="session-123",
	message="How do I create a Python list?",
	language="python"
	)
	```

	### 2. Session Manager (`chat_agent/services/session_manager.py`)

	Manages user sessions and chat state.

	Key Methods:
	- `create_session()`: Create new chat session
	- `get_session()`: Retrieve session information
	- `cleanup_inactive_sessions()`: Remove expired sessions

	Usage Example:
	```python
	from chat_agent.services.session_manager import SessionManager

	session_manager = SessionManager()

	# Create new session
	session = session_manager.create_session(
	user_id="user-123",
	language="python"
	)

	# Get session info
	session_info = session_manager.get_session(session['session_id'])
	```

	### 3. Language Context Manager (`chat_agent/services/language_context.py`)

	Handles programming language context and switching.

	Key Methods:
	- `set_language()`: Set current language for session
	- `get_language()`: Get current language
	- `get_language_prompt_template()`: Get language-specific prompts

	Usage Example:
	```python
	from chat_agent.services.language_context import LanguageContextManager

	lang_manager = LanguageContextManager()

	# Set language context
	lang_manager.set_language("session-123", "javascript")

	# Get current language
	current_lang = lang_manager.get_language("session-123")

	# Get prompt template
	template = lang_manager.get_language_prompt_template("python")
	```

	### 4. Chat History Manager (`chat_agent/services/chat_history.py`)

	Manages persistent and cached chat history.

	Key Methods:
	- `store_message()`: Store message in DB and cache
	- `get_recent_history()`: Get recent messages for context
	- `get_full_history()`: Get complete conversation history

	Usage Example:
	```python
	from chat_agent.services.chat_history import ChatHistoryManager

	history_manager = ChatHistoryManager()

	# Store a message
	message_id = history_manager.store_message(
	session_id="session-123",
	role="user",
	content="What is Python?",
	language="python"
	)

	# Get recent history
	recent = history_manager.get_recent_history("session-123", limit=10)
	```

	### 5. Groq Client (`chat_agent/services/groq_client.py`)

	Handles integration with Groq LangChain API.

	Key Methods:
	- `generate_response()`: Generate LLM response
	- `stream_response()`: Stream response generation
	- `handle_api_errors()`: Error handling and fallbacks

	Usage Example:
	```python
	from chat_agent.services.groq_client import GroqClient

	groq_client = GroqClient(api_key="your-api-key")

	# Generate response
	response = groq_client.generate_response(
	prompt="Explain Python functions",
	chat_history=recent_messages,
	language_context="python"
	)
	```

	## Development Setup

	### Prerequisites

	- Python 3.8+
	- PostgreSQL (for production) or SQLite (for development)
	- Redis (for caching and session management)
	- Groq API key

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd multi-language-chat-agent
	```

	2. Create virtual environment:
	```bash
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	```

	3. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	4. Set up environment variables:
	```bash
	cp .env.example .env
	# Edit .env with your configuration
	```

	5. Initialize database:
	```bash
	python init_db.py
	```

	6. Run the application:
	```bash
	python app.py
	```

	### Environment Configuration

	Required Environment Variables:
	```bash
	# Groq API Configuration
	GROQ_API_KEY=your-groq-api-key-here
	GROQ_MODEL=mixtral-8x7b-32768

	# Database Configuration
	DATABASE_URL=postgresql://user:password@localhost/chatdb
	# Or for SQLite: DATABASE_URL=sqlite:///instance/chat_agent.db

	# Redis Configuration
	REDIS_URL=redis://localhost:6379/0

	# Flask Configuration
	SECRET_KEY=your-secret-key-here
	FLASK_ENV=development
	```

	Optional Configuration:
	```bash
	# Rate Limiting
	RATE_LIMIT_ENABLED=true
	RATE_LIMIT_PER_MINUTE=30

	# Session Management
	SESSION_TIMEOUT=3600 # 1 hour in seconds
	CLEANUP_INTERVAL=300 # 5 minutes

	# Logging
	LOG_LEVEL=INFO
	LOG_FILE=logs/chat_agent.log
	```

	## Testing

	### Running Tests

	All Tests:
	```bash
	pytest
	```

	Specific Test Categories:
	```bash
	# Unit tests
	pytest tests/unit/

	# Integration tests
	pytest tests/integration/

	# End-to-end tests
	pytest tests/e2e/

	# Performance tests
	pytest tests/performance/
	```

	With Coverage:
	```bash
	pytest --cov=chat_agent --cov-report=html
	```

	### Test Structure

	```
	tests/
	├── unit/ # Unit tests for individual components
	│ ├── test_chat_agent.py
	│ ├── test_session_manager.py
	│ └── test_language_context.py
	├── integration/ # Integration tests
	│ ├── test_chat_api.py
	│ └── test_websocket_integration.py
	├── e2e/ # End-to-end workflow tests
	│ └── test_complete_chat_workflow.py
	└── performance/ # Load and performance tests
	└── test_load_testing.py
	```

	### Writing Tests

	Unit Test Example:
	```python
	import pytest
	from unittest.mock import Mock, patch
	from chat_agent.services.chat_agent import ChatAgent

	class TestChatAgent:
	@pytest.fixture
	def mock_dependencies(self):
	return {
	'groq_client': Mock(),
	'session_manager': Mock(),
	'language_context_manager': Mock(),
	'chat_history_manager': Mock()
	}

	def test_process_message_success(self, mock_dependencies):
	# Arrange
	chat_agent = ChatAgent(**mock_dependencies)
	mock_dependencies['groq_client'].generate_response.return_value = "Test response"

	# Act
	result = chat_agent.process_message("session-123", "Test message", "python")

	# Assert
	assert result == "Test response"
	mock_dependencies['groq_client'].generate_response.assert_called_once()
	```

	Integration Test Example:
	```python
	import pytest
	from chat_agent.services.chat_agent import ChatAgent

	class TestChatIntegration:
	@pytest.fixture
	def integrated_system(self):
	# Set up real components with test configuration
	return ChatAgent()

	def test_complete_chat_flow(self, integrated_system):
	# Test complete workflow with real components
	session_id = "test-session"
	response = integrated_system.process_message(
	session_id, "What is Python?", "python"
	)
	assert response is not None
	assert len(response) > 0
	```

	## API Development

	### Adding New Endpoints

	1. Create route in `chat_agent/api/chat_routes.py`:
	```python
	@chat_bp.route('/sessions/<session_id>/export', methods=['GET'])
	@require_auth
	@rate_limit(per_minute=10)
	def export_chat_history(session_id):
	"""Export chat history for a session."""
	try:
	# Validate session ownership
	session = session_manager.get_session(session_id)
	if not session or session['user_id'] != g.user_id:
	return jsonify({'error': 'Session not found'}), 404

	# Get full history
	history = chat_history_manager.get_full_history(session_id)

	return jsonify({
	'session_id': session_id,
	'messages': history,
	'exported_at': datetime.utcnow().isoformat()
	})

	except Exception as e:
	logger.error(f"Export error: {e}")
	return jsonify({'error': 'Export failed'}), 500
	```

	2. Add tests for the new endpoint:
	```python
	def test_export_chat_history(self, client, auth_headers):
	# Create session and messages
	session_response = client.post('/api/v1/chat/sessions',
	headers=auth_headers,
	json={'language': 'python'})
	session_id = session_response.json['session_id']

	# Test export
	response = client.get(f'/api/v1/chat/sessions/{session_id}/export',
	headers=auth_headers)

	assert response.status_code == 200
	assert 'messages' in response.json
	```

	3. Update API documentation in `chat_agent/api/README.md`

	### WebSocket Event Handling

	Adding New WebSocket Events:
	```python
	# In chat_agent/websocket/chat_websocket.py

	@socketio.on('custom_event')
	def handle_custom_event(data):
	"""Handle custom WebSocket event."""
	try:
	session_id = data.get('session_id')

	# Validate session
	if not session_manager.get_session(session_id):
	emit('error', {'error': 'Invalid session'})
	return

	# Process custom logic
	result = process_custom_logic(data)

	# Emit response
	emit('custom_response', {
	'session_id': session_id,
	'result': result,
	'timestamp': datetime.utcnow().isoformat()
	})

	except Exception as e:
	logger.error(f"Custom event error: {e}")
	emit('error', {'error': 'Processing failed'})
	```

	## Database Management

	### Schema Migrations

	Creating Migrations:
	```python
	# migrations/003_add_new_feature.py
	def upgrade(connection):
	"""Add new feature to database."""
	connection.execute("""
	ALTER TABLE messages
	ADD COLUMN sentiment_score FLOAT DEFAULT 0.0
	""")

	connection.execute("""
	CREATE INDEX idx_messages_sentiment
	ON messages(sentiment_score)
	""")

	def downgrade(connection):
	"""Remove new feature from database."""
	connection.execute("DROP INDEX idx_messages_sentiment")
	connection.execute("ALTER TABLE messages DROP COLUMN sentiment_score")
	```

	Running Migrations:
	```bash
	python migrations/migrate.py
	```

	### Database Optimization

	Indexing Strategy:
	```sql
	-- Session-based queries
	CREATE INDEX idx_messages_session_timestamp ON messages(session_id, timestamp);

	-- User-based queries
	CREATE INDEX idx_sessions_user_active ON chat_sessions(user_id, is_active);

	-- Language-based queries
	CREATE INDEX idx_messages_language ON messages(language);

	-- Full-text search (PostgreSQL)
	CREATE INDEX idx_messages_content_fts ON messages USING gin(to_tsvector('english', content));
	```

	## Performance Optimization

	### Caching Strategy

	Redis Caching:
	```python
	import redis
	import json
	from datetime import timedelta

	class CacheManager:
	def __init__(self, redis_url):
	self.redis_client = redis.from_url(redis_url)

	def cache_response(self, key, response, ttl=3600):
	"""Cache LLM response."""
	self.redis_client.setex(
	key,
	ttl,
	json.dumps(response)
	)

	def get_cached_response(self, key):
	"""Get cached response."""
	cached = self.redis_client.get(key)
	return json.loads(cached) if cached else None

	def cache_chat_history(self, session_id, messages):
	"""Cache recent chat history."""
	key = f"history:{session_id}"
	self.redis_client.setex(
	key,
	1800, # 30 minutes
	json.dumps(messages)
	)
	```

	Application-Level Caching:
	```python
	from functools import lru_cache

	class LanguageContextManager:
	@lru_cache(maxsize=128)
	def get_language_prompt_template(self, language):
	"""Cache prompt templates in memory."""
	return self._load_prompt_template(language)

	@lru_cache(maxsize=64)
	def get_supported_languages(self):
	"""Cache supported languages list."""
	return self._load_supported_languages()
	```

	### Database Connection Pooling

	```python
	from sqlalchemy import create_engine
	from sqlalchemy.pool import QueuePool

	# Configure connection pool
	engine = create_engine(
	DATABASE_URL,
	poolclass=QueuePool,
	pool_size=10,
	max_overflow=20,
	pool_pre_ping=True,
	pool_recycle=3600
	)
	```

	## Monitoring and Logging

	### Structured Logging

	```python
	import logging
	import json
	from datetime import datetime

	class StructuredLogger:
	def __init__(self, name):
	self.logger = logging.getLogger(name)

	def log_chat_interaction(self, session_id, user_message, response, language):
	"""Log chat interaction with structured data."""
	log_data = {
	'event': 'chat_interaction',
	'session_id': session_id,
	'language': language,
	'user_message_length': len(user_message),
	'response_length': len(response),
	'timestamp': datetime.utcnow().isoformat()
	}

	self.logger.info(json.dumps(log_data))

	def log_error(self, error, context=None):
	"""Log error with context."""
	log_data = {
	'event': 'error',
	'error_type': type(error).__name__,
	'error_message': str(error),
	'context': context or {},
	'timestamp': datetime.utcnow().isoformat()
	}

	self.logger.error(json.dumps(log_data))
	```

	### Health Checks

	```python
	from flask import Blueprint, jsonify
	import time

	health_bp = Blueprint('health', __name__)

	@health_bp.route('/health')
	def health_check():
	"""Comprehensive health check."""
	health_status = {
	'status': 'healthy',
	'timestamp': datetime.utcnow().isoformat(),
	'services': {}
	}

	# Check database
	try:
	db.session.execute('SELECT 1')
	health_status['services']['database'] = 'healthy'
	except Exception as e:
	health_status['services']['database'] = f'unhealthy: {e}'
	health_status['status'] = 'unhealthy'

	# Check Redis
	try:
	redis_client.ping()
	health_status['services']['redis'] = 'healthy'
	except Exception as e:
	health_status['services']['redis'] = f'unhealthy: {e}'
	health_status['status'] = 'unhealthy'

	# Check Groq API
	try:
	# Simple API test
	groq_client.test_connection()
	health_status['services']['groq_api'] = 'healthy'
	except Exception as e:
	health_status['services']['groq_api'] = f'unhealthy: {e}'
	health_status['status'] = 'unhealthy'

	status_code = 200 if health_status['status'] == 'healthy' else 503
	return jsonify(health_status), status_code
	```

	## Deployment

	### Docker Configuration

	Dockerfile:
	```dockerfile
	FROM python:3.9-slim

	WORKDIR /app

	# Install system dependencies
	RUN apt-get update && apt-get install -y \
	gcc \
	&& rm -rf /var/lib/apt/lists/*

	# Install Python dependencies
	COPY requirements.txt .
	RUN pip install --no-cache-dir -r requirements.txt

	# Copy application code
	COPY . .

	# Create non-root user
	RUN useradd --create-home --shell /bin/bash app
	USER app

	# Expose port
	EXPOSE 5000

	# Health check
	HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
	CMD curl -f http://localhost:5000/health \|\| exit 1

	# Start application
	CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]
	```

	docker-compose.yml:
	```yaml
	version: '3.8'

	services:
	chat-agent:
	build: .
	ports:
	- "5000:5000"
	environment:
	- DATABASE_URL=postgresql://postgres:password@db:5432/chatdb
	- REDIS_URL=redis://redis:6379/0
	- GROQ_API_KEY=${GROQ_API_KEY}
	depends_on:
	- db
	- redis
	volumes:
	- ./logs:/app/logs

	db:
	image: postgres:13
	environment:
	- POSTGRES_DB=chatdb
	- POSTGRES_USER=postgres
	- POSTGRES_PASSWORD=password
	volumes:
	- postgres_data:/var/lib/postgresql/data

	redis:
	image: redis:6-alpine
	volumes:
	- redis_data:/data

	volumes:
	postgres_data:
	redis_data:
	```

	### Production Considerations

	Security:
	- Use environment variables for sensitive configuration
	- Implement proper authentication and authorization
	- Enable HTTPS/TLS encryption
	- Regular security updates and vulnerability scanning

	Scalability:
	- Horizontal scaling with load balancers
	- Database read replicas for heavy read workloads
	- Redis clustering for high availability
	- CDN for static assets

	Monitoring:
	- Application performance monitoring (APM)
	- Log aggregation and analysis
	- Metrics collection and alerting
	- Health check endpoints

	## Contributing

	### Code Style

	Python Code Style:
	- Follow PEP 8 guidelines
	- Use type hints where appropriate
	- Maximum line length: 88 characters (Black formatter)
	- Use meaningful variable and function names

	Example:
	```python
	from typing import List, Dict, Optional
	from datetime import datetime

	def process_chat_message(
	session_id: str,
	message: str,
	language: str,
	metadata: Optional[Dict] = None
	) -> Dict[str, any]:
	"""
	Process a chat message and return response.

	Args:
	session_id: Unique session identifier
	message: User's chat message
	language: Programming language context
	metadata: Optional message metadata

	Returns:
	Dictionary containing response and metadata

	Raises:
	ValueError: If session_id is invalid
	APIError: If LLM API call fails
	"""
	if not session_id:
	raise ValueError("Session ID is required")

	# Implementation here
	return {
	'response': response_text,
	'timestamp': datetime.utcnow().isoformat(),
	'language': language
	}
	```

	### Pull Request Process

	1. Fork the repository
	2. Create feature branch: `git checkout -b feature/new-feature`
	3. Make changes with tests
	4. Run test suite: `pytest`
	5. Update documentation
	6. Submit pull request

	### Code Review Checklist

	- [ ] Code follows style guidelines
	- [ ] Tests are included and passing
	- [ ] Documentation is updated
	- [ ] No security vulnerabilities
	- [ ] Performance impact considered
	- [ ] Backward compatibility maintained

	---

	This developer guide provides comprehensive information for contributing to and extending the Multi-Language Chat Agent. For specific implementation details, refer to the source code and inline documentation.