scratch_chat / docs /DEVELOPER_GUIDE.md
WebashalarForML's picture
Upload 178 files
330b6e4 verified
# Multi-Language Chat Agent - Developer Guide
## Architecture Overview
The Multi-Language Chat Agent is built using a modular architecture with the following key components:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ WebSocket │ │ Chat Agent │
│ (HTML/JS) │◄──►│ Handler │◄──►│ Service │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Session │ │ Language │ │ Groq LLM │
│ Manager │ │ Context │ │ Client │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Chat History │
│ Manager │
└─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ Redis Cache │ │ PostgreSQL │
│ │ │ Database │
└─────────────────┘ └─────────────────┘
```
## Core Components
### 1. Chat Agent Service (`chat_agent/services/chat_agent.py`)
The main orchestrator that coordinates all chat operations.
**Key Methods:**
- `process_message()`: Main message processing pipeline
- `switch_language()`: Handle language context switching
- `stream_response()`: Real-time response streaming
**Usage Example:**
```python
from chat_agent.services.chat_agent import ChatAgent
# Initialize chat agent
chat_agent = ChatAgent()
# Process a message
response = chat_agent.process_message(
session_id="session-123",
message="How do I create a Python list?",
language="python"
)
```
### 2. Session Manager (`chat_agent/services/session_manager.py`)
Manages user sessions and chat state.
**Key Methods:**
- `create_session()`: Create new chat session
- `get_session()`: Retrieve session information
- `cleanup_inactive_sessions()`: Remove expired sessions
**Usage Example:**
```python
from chat_agent.services.session_manager import SessionManager
session_manager = SessionManager()
# Create new session
session = session_manager.create_session(
user_id="user-123",
language="python"
)
# Get session info
session_info = session_manager.get_session(session['session_id'])
```
### 3. Language Context Manager (`chat_agent/services/language_context.py`)
Handles programming language context and switching.
**Key Methods:**
- `set_language()`: Set current language for session
- `get_language()`: Get current language
- `get_language_prompt_template()`: Get language-specific prompts
**Usage Example:**
```python
from chat_agent.services.language_context import LanguageContextManager
lang_manager = LanguageContextManager()
# Set language context
lang_manager.set_language("session-123", "javascript")
# Get current language
current_lang = lang_manager.get_language("session-123")
# Get prompt template
template = lang_manager.get_language_prompt_template("python")
```
### 4. Chat History Manager (`chat_agent/services/chat_history.py`)
Manages persistent and cached chat history.
**Key Methods:**
- `store_message()`: Store message in DB and cache
- `get_recent_history()`: Get recent messages for context
- `get_full_history()`: Get complete conversation history
**Usage Example:**
```python
from chat_agent.services.chat_history import ChatHistoryManager
history_manager = ChatHistoryManager()
# Store a message
message_id = history_manager.store_message(
session_id="session-123",
role="user",
content="What is Python?",
language="python"
)
# Get recent history
recent = history_manager.get_recent_history("session-123", limit=10)
```
### 5. Groq Client (`chat_agent/services/groq_client.py`)
Handles integration with Groq LangChain API.
**Key Methods:**
- `generate_response()`: Generate LLM response
- `stream_response()`: Stream response generation
- `handle_api_errors()`: Error handling and fallbacks
**Usage Example:**
```python
from chat_agent.services.groq_client import GroqClient
groq_client = GroqClient(api_key="your-api-key")
# Generate response
response = groq_client.generate_response(
prompt="Explain Python functions",
chat_history=recent_messages,
language_context="python"
)
```
## Development Setup
### Prerequisites
- Python 3.8+
- PostgreSQL (for production) or SQLite (for development)
- Redis (for caching and session management)
- Groq API key
### Installation
1. **Clone the repository:**
```bash
git clone <repository-url>
cd multi-language-chat-agent
```
2. **Create virtual environment:**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
4. **Set up environment variables:**
```bash
cp .env.example .env
# Edit .env with your configuration
```
5. **Initialize database:**
```bash
python init_db.py
```
6. **Run the application:**
```bash
python app.py
```
### Environment Configuration
**Required Environment Variables:**
```bash
# Groq API Configuration
GROQ_API_KEY=your-groq-api-key-here
GROQ_MODEL=mixtral-8x7b-32768
# Database Configuration
DATABASE_URL=postgresql://user:password@localhost/chatdb
# Or for SQLite: DATABASE_URL=sqlite:///instance/chat_agent.db
# Redis Configuration
REDIS_URL=redis://localhost:6379/0
# Flask Configuration
SECRET_KEY=your-secret-key-here
FLASK_ENV=development
```
**Optional Configuration:**
```bash
# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_PER_MINUTE=30
# Session Management
SESSION_TIMEOUT=3600 # 1 hour in seconds
CLEANUP_INTERVAL=300 # 5 minutes
# Logging
LOG_LEVEL=INFO
LOG_FILE=logs/chat_agent.log
```
## Testing
### Running Tests
**All Tests:**
```bash
pytest
```
**Specific Test Categories:**
```bash
# Unit tests
pytest tests/unit/
# Integration tests
pytest tests/integration/
# End-to-end tests
pytest tests/e2e/
# Performance tests
pytest tests/performance/
```
**With Coverage:**
```bash
pytest --cov=chat_agent --cov-report=html
```
### Test Structure
```
tests/
├── unit/ # Unit tests for individual components
│ ├── test_chat_agent.py
│ ├── test_session_manager.py
│ └── test_language_context.py
├── integration/ # Integration tests
│ ├── test_chat_api.py
│ └── test_websocket_integration.py
├── e2e/ # End-to-end workflow tests
│ └── test_complete_chat_workflow.py
└── performance/ # Load and performance tests
└── test_load_testing.py
```
### Writing Tests
**Unit Test Example:**
```python
import pytest
from unittest.mock import Mock, patch
from chat_agent.services.chat_agent import ChatAgent
class TestChatAgent:
@pytest.fixture
def mock_dependencies(self):
return {
'groq_client': Mock(),
'session_manager': Mock(),
'language_context_manager': Mock(),
'chat_history_manager': Mock()
}
def test_process_message_success(self, mock_dependencies):
# Arrange
chat_agent = ChatAgent(**mock_dependencies)
mock_dependencies['groq_client'].generate_response.return_value = "Test response"
# Act
result = chat_agent.process_message("session-123", "Test message", "python")
# Assert
assert result == "Test response"
mock_dependencies['groq_client'].generate_response.assert_called_once()
```
**Integration Test Example:**
```python
import pytest
from chat_agent.services.chat_agent import ChatAgent
class TestChatIntegration:
@pytest.fixture
def integrated_system(self):
# Set up real components with test configuration
return ChatAgent()
def test_complete_chat_flow(self, integrated_system):
# Test complete workflow with real components
session_id = "test-session"
response = integrated_system.process_message(
session_id, "What is Python?", "python"
)
assert response is not None
assert len(response) > 0
```
## API Development
### Adding New Endpoints
1. **Create route in `chat_agent/api/chat_routes.py`:**
```python
@chat_bp.route('/sessions/<session_id>/export', methods=['GET'])
@require_auth
@rate_limit(per_minute=10)
def export_chat_history(session_id):
"""Export chat history for a session."""
try:
# Validate session ownership
session = session_manager.get_session(session_id)
if not session or session['user_id'] != g.user_id:
return jsonify({'error': 'Session not found'}), 404
# Get full history
history = chat_history_manager.get_full_history(session_id)
return jsonify({
'session_id': session_id,
'messages': history,
'exported_at': datetime.utcnow().isoformat()
})
except Exception as e:
logger.error(f"Export error: {e}")
return jsonify({'error': 'Export failed'}), 500
```
2. **Add tests for the new endpoint:**
```python
def test_export_chat_history(self, client, auth_headers):
# Create session and messages
session_response = client.post('/api/v1/chat/sessions',
headers=auth_headers,
json={'language': 'python'})
session_id = session_response.json['session_id']
# Test export
response = client.get(f'/api/v1/chat/sessions/{session_id}/export',
headers=auth_headers)
assert response.status_code == 200
assert 'messages' in response.json
```
3. **Update API documentation in `chat_agent/api/README.md`**
### WebSocket Event Handling
**Adding New WebSocket Events:**
```python
# In chat_agent/websocket/chat_websocket.py
@socketio.on('custom_event')
def handle_custom_event(data):
"""Handle custom WebSocket event."""
try:
session_id = data.get('session_id')
# Validate session
if not session_manager.get_session(session_id):
emit('error', {'error': 'Invalid session'})
return
# Process custom logic
result = process_custom_logic(data)
# Emit response
emit('custom_response', {
'session_id': session_id,
'result': result,
'timestamp': datetime.utcnow().isoformat()
})
except Exception as e:
logger.error(f"Custom event error: {e}")
emit('error', {'error': 'Processing failed'})
```
## Database Management
### Schema Migrations
**Creating Migrations:**
```python
# migrations/003_add_new_feature.py
def upgrade(connection):
"""Add new feature to database."""
connection.execute("""
ALTER TABLE messages
ADD COLUMN sentiment_score FLOAT DEFAULT 0.0
""")
connection.execute("""
CREATE INDEX idx_messages_sentiment
ON messages(sentiment_score)
""")
def downgrade(connection):
"""Remove new feature from database."""
connection.execute("DROP INDEX idx_messages_sentiment")
connection.execute("ALTER TABLE messages DROP COLUMN sentiment_score")
```
**Running Migrations:**
```bash
python migrations/migrate.py
```
### Database Optimization
**Indexing Strategy:**
```sql
-- Session-based queries
CREATE INDEX idx_messages_session_timestamp ON messages(session_id, timestamp);
-- User-based queries
CREATE INDEX idx_sessions_user_active ON chat_sessions(user_id, is_active);
-- Language-based queries
CREATE INDEX idx_messages_language ON messages(language);
-- Full-text search (PostgreSQL)
CREATE INDEX idx_messages_content_fts ON messages USING gin(to_tsvector('english', content));
```
## Performance Optimization
### Caching Strategy
**Redis Caching:**
```python
import redis
import json
from datetime import timedelta
class CacheManager:
def __init__(self, redis_url):
self.redis_client = redis.from_url(redis_url)
def cache_response(self, key, response, ttl=3600):
"""Cache LLM response."""
self.redis_client.setex(
key,
ttl,
json.dumps(response)
)
def get_cached_response(self, key):
"""Get cached response."""
cached = self.redis_client.get(key)
return json.loads(cached) if cached else None
def cache_chat_history(self, session_id, messages):
"""Cache recent chat history."""
key = f"history:{session_id}"
self.redis_client.setex(
key,
1800, # 30 minutes
json.dumps(messages)
)
```
**Application-Level Caching:**
```python
from functools import lru_cache
class LanguageContextManager:
@lru_cache(maxsize=128)
def get_language_prompt_template(self, language):
"""Cache prompt templates in memory."""
return self._load_prompt_template(language)
@lru_cache(maxsize=64)
def get_supported_languages(self):
"""Cache supported languages list."""
return self._load_supported_languages()
```
### Database Connection Pooling
```python
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
# Configure connection pool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=10,
max_overflow=20,
pool_pre_ping=True,
pool_recycle=3600
)
```
## Monitoring and Logging
### Structured Logging
```python
import logging
import json
from datetime import datetime
class StructuredLogger:
def __init__(self, name):
self.logger = logging.getLogger(name)
def log_chat_interaction(self, session_id, user_message, response, language):
"""Log chat interaction with structured data."""
log_data = {
'event': 'chat_interaction',
'session_id': session_id,
'language': language,
'user_message_length': len(user_message),
'response_length': len(response),
'timestamp': datetime.utcnow().isoformat()
}
self.logger.info(json.dumps(log_data))
def log_error(self, error, context=None):
"""Log error with context."""
log_data = {
'event': 'error',
'error_type': type(error).__name__,
'error_message': str(error),
'context': context or {},
'timestamp': datetime.utcnow().isoformat()
}
self.logger.error(json.dumps(log_data))
```
### Health Checks
```python
from flask import Blueprint, jsonify
import time
health_bp = Blueprint('health', __name__)
@health_bp.route('/health')
def health_check():
"""Comprehensive health check."""
health_status = {
'status': 'healthy',
'timestamp': datetime.utcnow().isoformat(),
'services': {}
}
# Check database
try:
db.session.execute('SELECT 1')
health_status['services']['database'] = 'healthy'
except Exception as e:
health_status['services']['database'] = f'unhealthy: {e}'
health_status['status'] = 'unhealthy'
# Check Redis
try:
redis_client.ping()
health_status['services']['redis'] = 'healthy'
except Exception as e:
health_status['services']['redis'] = f'unhealthy: {e}'
health_status['status'] = 'unhealthy'
# Check Groq API
try:
# Simple API test
groq_client.test_connection()
health_status['services']['groq_api'] = 'healthy'
except Exception as e:
health_status['services']['groq_api'] = f'unhealthy: {e}'
health_status['status'] = 'unhealthy'
status_code = 200 if health_status['status'] == 'healthy' else 503
return jsonify(health_status), status_code
```
## Deployment
### Docker Configuration
**Dockerfile:**
```dockerfile
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create non-root user
RUN useradd --create-home --shell /bin/bash app
USER app
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# Start application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]
```
**docker-compose.yml:**
```yaml
version: '3.8'
services:
chat-agent:
build: .
ports:
- "5000:5000"
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/chatdb
- REDIS_URL=redis://redis:6379/0
- GROQ_API_KEY=${GROQ_API_KEY}
depends_on:
- db
- redis
volumes:
- ./logs:/app/logs
db:
image: postgres:13
environment:
- POSTGRES_DB=chatdb
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=password
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:6-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
```
### Production Considerations
**Security:**
- Use environment variables for sensitive configuration
- Implement proper authentication and authorization
- Enable HTTPS/TLS encryption
- Regular security updates and vulnerability scanning
**Scalability:**
- Horizontal scaling with load balancers
- Database read replicas for heavy read workloads
- Redis clustering for high availability
- CDN for static assets
**Monitoring:**
- Application performance monitoring (APM)
- Log aggregation and analysis
- Metrics collection and alerting
- Health check endpoints
## Contributing
### Code Style
**Python Code Style:**
- Follow PEP 8 guidelines
- Use type hints where appropriate
- Maximum line length: 88 characters (Black formatter)
- Use meaningful variable and function names
**Example:**
```python
from typing import List, Dict, Optional
from datetime import datetime
def process_chat_message(
session_id: str,
message: str,
language: str,
metadata: Optional[Dict] = None
) -> Dict[str, any]:
"""
Process a chat message and return response.
Args:
session_id: Unique session identifier
message: User's chat message
language: Programming language context
metadata: Optional message metadata
Returns:
Dictionary containing response and metadata
Raises:
ValueError: If session_id is invalid
APIError: If LLM API call fails
"""
if not session_id:
raise ValueError("Session ID is required")
# Implementation here
return {
'response': response_text,
'timestamp': datetime.utcnow().isoformat(),
'language': language
}
```
### Pull Request Process
1. **Fork the repository**
2. **Create feature branch:** `git checkout -b feature/new-feature`
3. **Make changes with tests**
4. **Run test suite:** `pytest`
5. **Update documentation**
6. **Submit pull request**
### Code Review Checklist
- [ ] Code follows style guidelines
- [ ] Tests are included and passing
- [ ] Documentation is updated
- [ ] No security vulnerabilities
- [ ] Performance impact considered
- [ ] Backward compatibility maintained
---
This developer guide provides comprehensive information for contributing to and extending the Multi-Language Chat Agent. For specific implementation details, refer to the source code and inline documentation.