BankBot-AI / docs /VIVA_NOTES.md
mohsin-devs's picture
Deploy to HF
a282d4b
|
Raw
History Blame Contribute Delete
9.91 kB
# BankBot AI β€” Viva / Presentation Notes
## Elevator Pitch (30 seconds)
> "BankBot is a production-grade AI financial platform. It gives users a real-time intelligent view of their finances through a multi-provider AI engine β€” OpenAI, Groq, and local Ollama β€” with live WebSocket streaming, fraud detection, financial forecasting, and a premium glassmorphism UI. Every layer has a fallback so the system never fully fails. It's containerized, deployable to cloud platforms, and designed to feel like a real fintech product."
---
## Key Numbers (Memorize These)
| Metric | Value |
|--------|-------|
| API routes | 33 |
| Frontend pages | 14 |
| Database tables | 10 |
| Demo transactions | 301 |
| Dashboard cold | **65ms** |
| Dashboard cached | **10ms** |
| All endpoints (warm) | **< 20ms** |
| AI fallback levels | 4 |
| JWT access TTL | 60 minutes |
| JWT refresh TTL | 7 days |
| Cache TTLs | 2min / 10min / 1hr |
| Rate limit | 120 req/min per IP |
| Auth rate limit | 10 req/min per IP |
| WebSocket heartbeat | 25 seconds |
| bcrypt rounds | 12 |
| Injection patterns blocked | 9 regex patterns |
| CI/CD | GitHub Actions (3 jobs) |
---
## Demo Script (5 minutes)
### 1. Login (30s)
- Open http://localhost:3000
- Login: `alex@bankbot.dev` / `BankBot2026!`
- Point out: JWT stored in localStorage, auto-refresh on expiry
### 2. Dashboard (45s)
- Show: real balance $59,637, cash flow chart, AI briefing banner
- "This loads in 65ms cold β€” single optimized DB query, no AI blocking"
- Show fraud alert banner at bottom β€” "1 pending alert"
- Open DevTools Network tab β€” show the 65ms response
### 3. AI Chat (90s)
- Navigate to `/chat`
- Show animated AI orb, connection status badge (Live)
- Type: **"Analyze my spending this month"**
- Point out: character-by-character streaming via WebSocket
- Type: **"What's my biggest financial risk?"**
- "The AI has full context β€” balance, goals, investments, behavior patterns injected into every prompt"
- Show the HTTP fallback: disconnect network β†’ message still sends via POST
### 4. What-If Simulator (45s)
- Navigate to `/simulator`
- Show AI scenario cards (Conservative / Balanced / Optimistic) β€” "from real backend forecasting"
- Move Savings Rate slider from 40% β†’ 55%
- "Chart updates instantly β€” local projection engine, no API call needed"
- Show AI insight updating dynamically below
### 5. Analytics (30s)
- Navigate to `/analytics`
- Show cash flow bar chart β€” real DB data
- Switch to Categories β€” spending breakdown with AI insights
- Switch to Net Worth β€” trajectory over time
### 6. Transactions (20s)
- Navigate to `/transactions`
- Show 301 transactions, pagination
- Filter by "Expenses" β€” instant filter
- Search "Amazon" β€” client-side search
### 7. System Status (30s)
- Navigate to `/status`
- Show live metrics: uptime, request count, cache hit ratio, route timings
- "This is a real observability dashboard β€” not mocked data"
- Point out: dashboard at 17ms avg, metrics at 2ms
---
## Technical Q&A (15 Most Likely Questions)
### Q1: Why FastAPI over Django/Flask?
**A:** FastAPI gives us async support, automatic OpenAPI docs at `/docs`, Pydantic validation, and native WebSocket support β€” all critical for a real-time AI platform. It's also 2-3x faster than Flask for I/O-bound workloads due to async/await.
### Q2: How does the AI fallback chain work?
**A:** At startup, the system checks for API keys. If `OPENAI_API_KEY` exists β†’ GPT-4o-mini. If not β†’ Groq's llama-3.3-70b (free tier, very fast). If neither β†’ local Ollama (fully offline). If Ollama isn't running β†’ rule-based responses derived from the user's actual database records. The user always gets a response β€” the system never returns an error to the user.
### Q3: Why WebSocket for chat instead of HTTP streaming?
**A:** WebSocket gives us bidirectional communication β€” the server can push fraud alerts, balance updates, and AI responses without the client polling. HTTP SSE is one-directional. WebSocket also supports our heartbeat/ping system (every 25s) for connection health monitoring, and enables future push notifications.
### Q4: How is the dashboard so fast (65ms)?
**A:** Three optimizations: (1) `Query.with_entities()` β€” selects only needed columns, no full ORM object hydration. (2) Single query for 6-month cash flow instead of 6 sequential queries. (3) Dashboard never calls AI inline β€” reads from cache only. Result: 65ms cold, 10ms cached.
### Q5: How does fraud detection work?
**A:** Rule-based scoring across 4 dimensions: amount spike (>3.5x average = +40pts), timing anomaly (11PM-4AM = +25pts), rapid-fire transactions (<3min gap = +20pts), duplicate detection (same merchant+amount within 10min = +30pts). Score β‰₯30 logs to `fraud_logs`, β‰₯50 flags as high-risk. No ML model needed β€” deterministic and explainable.
### Q6: What's the financial health score?
**A:** A 100-point composite across 6 dimensions: Savings Consistency (20pts), Debt Ratio (20pts), Spending Discipline (20pts), Emergency Fund (20pts), Investment Index (10pts), Subscription Efficiency (10pts). Each calculated from real DB records. An AI explanation is generated and cached for 10 minutes to control LLM costs.
### Q7: How is authentication secured?
**A:** JWT with two tokens β€” 60-minute access token and 7-day refresh token. Passwords hashed with bcrypt (rounds=12) using the `bcrypt` library directly β€” not passlib, which has a known incompatibility with bcrypt>=4. Rate limiting on auth: 10 requests/minute per IP. Auto-refresh on 401 in the frontend API client.
### Q8: Why SQLite fallback?
**A:** For development and demo β€” no PostgreSQL setup required. The fallback is automatic: if PostgreSQL connection fails, the app switches to SQLite. Same ORM code, same queries, zero code changes. This makes the project runnable on any machine with just Python.
### Q9: How does the What-If Simulator work?
**A:** Two layers: (1) Local projection engine in the browser β€” instant feedback as sliders move, using compound interest math. No API call, no latency. (2) Real AI scenarios from `/api/ai/twin/scenarios` β€” the backend runs forecasting algorithms on actual transaction history to generate conservative/expected/optimistic projections.
### Q10: What is prompt injection and how do you prevent it?
**A:** Prompt injection is when a user tries to override the AI's system prompt with instructions like "ignore all previous instructions". We prevent it with 9 regex patterns that detect common injection phrases. Flagged messages return a safe error response and are logged. The sanitizer also strips control characters and truncates to 2000 characters.
### Q11: How does the cache-aside pattern work?
**A:** On every request, check cache first. If hit β†’ return cached data (10ms). If miss β†’ query DB, compute result, store in cache with TTL, return result. Redis is primary; if Redis is unavailable, an in-memory dict with TTL tracking is used automatically. No configuration needed β€” the fallback is transparent.
### Q12: What makes this production-grade?
**A:** Multi-stage Docker builds with non-root users. Nginx reverse proxy with WebSocket support and rate limiting. Structured JSON logging with request tracing. Live observability dashboard. Security headers on all responses. CORS restricted to configured origins. Graceful degradation at every layer. GitHub Actions CI/CD. Health checks on all Docker services.
### Q13: How would this scale to 10,000 users?
**A:** Horizontal scaling via Docker Swarm or Kubernetes β€” stateless JWT means any backend instance handles any request. Redis handles shared cache state across instances. PostgreSQL connection pooling via PgBouncer. The AI calls are already thread-safe with timeout guards. The WebSocket manager would need to move to Redis pub/sub for multi-instance support.
### Q14: What's the biggest technical challenge you solved?
**A:** The dashboard was timing out at 2+ seconds because it called the AI synchronously on every request. I fixed it by: (1) separating AI generation from data retrieval β€” dashboard reads from cache only, (2) replacing full ORM object loading with column-only queries using `with_entities()`, (3) caching the fraud count separately. This dropped response time from 2.1s to 65ms β€” a 32x improvement.
### Q15: What would you add with more time?
**A:** Real-time WebSocket notifications for fraud alerts (currently polling). Token blacklisting on logout using Redis. End-to-end tests with Playwright. Plaid API integration for live bank transaction data. Mobile app with React Native. Multi-currency support. Budget planning with ML-based category prediction.
---
## Architecture Talking Points
### "Why is this impressive?"
1. **AI Provider Fallback Chain** β€” 4 levels, always returns a response
2. **Real-time WebSocket Streaming** β€” with reconnect, heartbeat, HTTP fallback
3. **Financial Twin** β€” AI has full user context injected per message
4. **Cache-Aside Pattern** β€” Redis + memory fallback, automatic
5. **Observability** β€” live metrics, structured logging, request tracing
6. **Security** β€” JWT rotation, bcrypt, rate limiting, prompt injection prevention
7. **Performance** β€” 65ms dashboard, 10ms cached, all endpoints < 20ms warm
### "What's the most technically advanced part?"
The AI orchestration layer β€” `stream_chat_response()` builds a personalized system prompt from the user's live database records (balance, goals, investments, behavior), then streams the response through whichever AI provider is available, with automatic fallback and in-memory chat history management.
### "How is this different from a CRUD app?"
A CRUD app stores and retrieves data. BankBot analyzes it, predicts from it, detects anomalies in it, and explains it in natural language β€” in real-time, personalized to each user's actual financial profile.