Spaces:

bayan10
/

bayan-api

Running

youssefreda9 Claude Opus 4.6 commited on 1 day ago

Commit

5dcc696

1 Parent(s): 84cce80

chore: complete 100-finding audit + deep project cleanup

- Complete all 96 actionable audit findings across 6 categories
- Fix TD8 Grammrar typo, TD16 script bundling, TD18 CSS cleanup
- Delete dead code: punctuation/spelling dir, orphaned contextual_corrector
- Remove debug output, one-off scripts, stale docs
- Archive reports and 35 dev test scripts
- Remove unused deps: datasets, scikit-learn, pandas, rapidfuzz
- Update .gitignore for build output and debug artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.github/workflows/deploy.yml +3 -3
.gitignore +12 -1
BAYAN_COMPLETE_AUDIT.md +0 -366
Dockerfile +41 -0
PROJECT_DESCRIPTION.md +2 -2
QUICKSTART.md +0 -126
README_SETUP.md +0 -172
analyze_failures.py +0 -67
apply_locks.py +0 -77
archive/BAYAN_COMPLETE_AUDIT.md +510 -0
{reports → archive/benchmark_reports}/Phase10_Post_IVtoOOV_Audit.md +0 -0
{reports → archive/benchmark_reports}/benchmark_audit.md +0 -0
{reports → archive/benchmark_reports}/benchmark_samples.md +0 -0
{reports → archive/benchmark_reports}/regression_benchmark_audit.md +0 -0
debug_pc002.py → archive/dev_tests/debug_pc002.py +0 -0
debug_pc023.py → archive/dev_tests/debug_pc023.py +0 -0
debug_pipeline.py → archive/dev_tests/debug_pipeline.py +0 -0
debug_punctuation.py → archive/dev_tests/debug_punctuation.py +0 -0
extract_failures.py → archive/dev_tests/extract_failures.py +0 -0
extract_grammar_fails.py → archive/dev_tests/extract_grammar_fails.py +0 -0
extract_pc023.py → archive/dev_tests/extract_pc023.py +0 -0
test_camel.py → archive/dev_tests/test_camel.py +0 -0
test_colon.py → archive/dev_tests/test_colon.py +0 -0
test_failures.py → archive/dev_tests/test_failures.py +0 -0
test_grammar_fast.py → archive/dev_tests/test_grammar_fast.py +0 -0
test_grammar_fixes.py → archive/dev_tests/test_grammar_fixes.py +0 -0
test_grammar_logic.py → archive/dev_tests/test_grammar_logic.py +0 -0
test_grammar_only.py → archive/dev_tests/test_grammar_only.py +0 -0
test_grammar_rules.py → archive/dev_tests/test_grammar_rules.py +0 -0
test_kana.py → archive/dev_tests/test_kana.py +0 -0
test_local.py → archive/dev_tests/test_local.py +0 -0
test_mapper.py → archive/dev_tests/test_mapper.py +0 -0
test_mapper_isolated.py → archive/dev_tests/test_mapper_isolated.py +0 -0
test_mlm.py → archive/dev_tests/test_mlm.py +0 -0
test_models.py → archive/dev_tests/test_models.py +0 -0
test_pc.py → archive/dev_tests/test_pc.py +0 -0
test_pc001.py → archive/dev_tests/test_pc001.py +0 -0
test_pc002.py → archive/dev_tests/test_pc002.py +0 -0
test_pc002_api.py → archive/dev_tests/test_pc002_api.py +0 -0
test_pc023.py → archive/dev_tests/test_pc023.py +0 -0
test_pc027.py → archive/dev_tests/test_pc027.py +0 -0
test_pc034.py → archive/dev_tests/test_pc034.py +0 -0
test_pc044.py → archive/dev_tests/test_pc044.py +0 -0
test_pos.py → archive/dev_tests/test_pos.py +0 -0
test_punc.py → archive/dev_tests/test_punc.py +0 -0
test_punc_rules.py → archive/dev_tests/test_punc_rules.py +0 -0
test_punctuation.py → archive/dev_tests/test_punctuation.py +0 -0
test_raw_punc.py → archive/dev_tests/test_raw_punc.py +0 -0
test_sv.py → archive/dev_tests/test_sv.py +0 -0
extension/IMPLEMENTATION_CHANGELOG.md → archive/phase_reports/extension_changelog.md +0 -0

.github/workflows/deploy.yml CHANGED Viewed

@@ -28,7 +28,7 @@ jobs:
       - name: Verify critical files exist
         run: |
-          for f in src/app.py src/model_loader.py src/hf_inference.py src/index.html \
                    src/nlp/__init__.py src/nlp/spelling/araspell_service.py \
                    src/nlp/grammar/grammar_service.py src/nlp/punctuation/punctuation_service.py \
                    Dockerfile Procfile requirements.txt; do
@@ -36,11 +36,11 @@ jobs:
           done
           echo "✅ All critical files present"
-      - name: Verify API routes defined in app.py
         run: |
           for route in "/api/health" "/api/analyze" "/api/summarize" "/api/spelling" \
                        "/api/grammar" "/api/punctuation" "/api/quran"; do
-            grep -q "$route" src/app.py && echo "  ✅ $route" || { echo "  ❌ MISSING ROUTE: $route"; exit 1; }
           done
           echo "✅ All API routes defined"

       - name: Verify critical files exist
         run: |
+          for f in src/app.py src/model_loader.py src/index.html \
                    src/nlp/__init__.py src/nlp/spelling/araspell_service.py \
                    src/nlp/grammar/grammar_service.py src/nlp/punctuation/punctuation_service.py \
                    Dockerfile Procfile requirements.txt; do
           done
           echo "✅ All critical files present"
+      - name: Verify API routes defined
         run: |
           for route in "/api/health" "/api/analyze" "/api/summarize" "/api/spelling" \
                        "/api/grammar" "/api/punctuation" "/api/quran"; do
+            grep -rq "$route" src/routes/ src/app.py && echo "  ✅ $route" || { echo "  ❌ MISSING ROUTE: $route"; exit 1; }
           done
           echo "✅ All API routes defined"

.gitignore CHANGED Viewed

@@ -38,4 +38,15 @@ node_modules/
 # Test artifacts
 .pytest_cache/
 test-results/
-extension/assets/icons/*.png

 # Test artifacts
 .pytest_cache/
 test-results/
+extension/assets/icons/*.png
+# Build output
+dist/
+src/js/bayan.bundle.js
+# Debug/temp output
+out*.txt
+local_debug.txt
+pc_data.txt
+camel_test_out.json
+grammar_fails_output.md

BAYAN_COMPLETE_AUDIT.md DELETED Viewed

@@ -1,366 +0,0 @@
-# BAYAN — Complete Product, Codebase & Extension Deep Audit
-> **Audit Date:** 2026-06-26
-> **Auditor Perspective:** Product Manager + Senior Frontend + Backend Architect + Extension Engineer + SaaS Reviewer
----
-## 1. Current System Overview
-### Architecture Map
-```
-┌──────────────────────────────────────────────────────┐
-│                   BAYAN ECOSYSTEM                     │
-│                                                       │
-│  ┌─────────┐    ┌──────────┐    ┌─────────────────┐  │
-│  │ Website  │───▶│ Flask API │───▶│  NLP Pipeline   │  │
-│  │ (SPA)    │    │ (app.py) │    │ Spell/Gram/Punct│  │
-│  └─────────┘    └──────────┘    └─────────────────┘  │
-│       │              │                    │           │
-│       │              │          ┌─────────────────┐  │
-│       │              ├─────────▶│  HF Models      │  │
-│       │              │          │  Summarization   │  │
-│       │              │          │  Grammar (Gradio)│  │
-│       │              │          └─────────────────┘  │
-│       │              │                               │
-│  ┌─────────┐    ┌──────────┐    ┌─────────────────┐  │
-│  │Supabase │◀───│  Auth    │───▶│  Documents DB   │  │
-│  │ (Cloud) │    │  Module  │    │  Settings Sync  │  │
-│  └─────────┘    └──────────┘    └─────────────────┘  │
-│                                                       │
-│  ┌────────────────────────────────────────────────┐   │
-│  │           Chrome Extension (MV3)               │   │
-│  │  ┌──────────┐ ┌──────────┐ ┌───────────────┐  │   │
-│  │  │ Content  │ │Background│ │  Side Panel   │  │   │
-│  │  │ Script   │ │  Worker  │ │  + Popup      │  │   │
-│  │  └──────────┘ └──────────┘ └───────────────┘  │   │
-│  └────────────────────────────────────────────────┘   │
-└──────────────────────────────────────────────────────┘
-```
-### Technology Stack
-| Layer | Technology | Notes |
-|-------|-----------|-------|
-| **Frontend** | Vanilla JS, HTML, CSS (Tailwind CDN) | Custom `contenteditable` editor engine |
-| **Backend** | Flask (Python) | Single monolith `app.py` — 2,844 lines |
-| **NLP Pipeline** | Custom Python modules | Spelling, Grammar, Punctuation, Autocomplete, Dialect |
-| **AI Models** | Transformer-based | Summarization (local), Grammar (Gradio proxy), Spelling (CAMeL + custom) |
-| **Database** | Supabase (PostgreSQL) | Documents, profiles, user settings |
-| **Auth** | Supabase Auth | Guest (anonymous), Google OAuth |
-| **Deployment** | HuggingFace Spaces (Docker) | CPU-only free tier |
-| **Extension** | Chrome MV3 | Background SW, Content Script, Side Panel, Popup |
-### File Structure Summary
-| Directory | Files | Purpose |
-|-----------|-------|---------|
-| `src/` | 6 core files | Backend + HTML + CSS |
-| `src/js/` | 8 JS files + 7 subdirs | Frontend logic |
-| `src/js/auth/` | 5 files | Supabase auth (client, session, UI) |
-| `src/js/documents/` | 4 files | Local doc management + export |
-| `src/js/documents-cloud/` | 3 files | Supabase CRUD for documents |
-| `src/js/sync/` | 3 files | Offline queue + conflict resolution |
-| `src/js/settings-sync/` | 2 files | User settings cloud persistence |
-| `src/nlp/` | 6 subdirs | All NLP processing modules |
-| `extension/` | 8 files + 4 subdirs | Chrome Extension |
-| `extension/shared/` | 9 files | Shared utilities (api, renderer, patches) |
-| `extension/sidepanel/` | 3 files | Side panel UI |
-| `tests/` | 16 test files | Backend unit tests |
-| `extension/tests/` | 8 files | Extension integration tests |
----
-## 2. Feature Inventory
-### Core AI Features
-| Feature | Backend API | Website Frontend | Extension | Files |
-|---------|------------|-----------------|-----------|-------|
-| **Spelling Correction** | ✅ `/api/spelling` + `/api/analyze` | ✅ Full (highlights, suggestions, apply) | ✅ Inline overlay + Popup + SidePanel | `nlp/spelling/`, `editor.js`, `renderer.js` |
-| **Grammar Correction** | ✅ `/api/grammar` + `/api/analyze` | ✅ Full (via Gradio proxy to HF model) | ✅ Inline overlay + Popup + SidePanel | `nlp/grammar/`, `hf_inference.py` |
-| **Punctuation** | ✅ `/api/punctuation` + `/api/analyze` | ✅ Full (PuncAra-v1 model) | ✅ Inline overlay + Popup + SidePanel | `nlp/punctuation/` |
-| **Summarization** | ✅ `/api/summarize` | ✅ Full (tab in editor, length control) | ✅ Popup tab + SidePanel tab | `model_loader.py`, `summaries-api.js` |
-| **AutoComplete** | ✅ `/api/autocomplete` | ✅ Ghost text + dropdown in editor | ⚠️ SidePanel text-box only, NO inline ghost text | `autocomplete.js`, sidepanel `btnAutocomplete` |
-| **Dialect→MSA** | ✅ `/api/dialect` | ✅ Dedicated editor tab | ✅ SidePanel tab (basic text→text) | `nlp/dialect/` |
-| **Quran Verification** | ✅ `/api/quran` | ✅ Dedicated editor tab | ✅ SidePanel tab (basic text→text) | `quran.py`, `quran_master.db` |
-### Platform Features
-| Feature | Website | Extension (Popup) | Extension (SidePanel) | Extension (Content Script) |
-|---------|---------|-------------------|----------------------|--------------------------|
-| **Authentication** | ✅ Guest + Google | ❌ None | ⚠️ Partial (`initExtensionAuth()` exists but requires web page auth sync) | ⚠️ Listens for `BAYAN_AUTH_SYNC` message from web |
-| **Document Save** | ✅ Supabase CRUD | ❌ None | ⚠️ UI exists (`btnNewDocument`, `btnSaveSelection`) but depends on auth | ❌ None |
-| **Document Load/History** | ✅ Full panel | ❌ None | ⚠️ UI exists (`documentsList`, `historyList`) but depends on auth | ❌ None |
-| **Export (PDF/DOCX/TXT)** | ✅ Full (mammoth.js, docx.js) | ❌ None | ❌ None | ❌ None |
-| **Import (TXT/DOCX)** | ✅ Full | ❌ None | ❌ None | ❌ None |
-| **Settings Sync** | ✅ Supabase | ❌ None | ⚠️ Placeholder (`syncExtensionSettings()`) | ❌ None |
-| **Theme Toggle** | ✅ Full dark/light | ❌ Hardcoded dark | ✅ Dark only | N/A |
-| **Focus Mode** | ✅ Full | N/A | ❌ None | N/A |
-| **Score Ring** | ✅ Animated SVG | ✅ Simplified | ✅ Simplified | ❌ None |
-| **Writing Score History** | ✅ Sparkline chart | ❌ None | ❌ None | ❌ None |
-| **Error Donut Chart** | ✅ SVG donut | ❌ None | ❌ None | ❌ None |
-| **Offline Mode** | ✅ Graceful degradation | ❌ No offline handling | ❌ No offline handling | ❌ No offline handling |
-| **Keyboard Shortcuts** | ✅ Extensive (Alt+1-3, Ctrl+S, etc.) | ❌ None | ❌ None | ❌ None |
----
-## 3. Website vs Extension Comparison
-### Authentication Flow
-| Aspect | Website | Extension | Gap |
-|--------|---------|-----------|-----|
-| Guest login | ✅ `signInAnonymously()` | ❌ | **Critical** — extension users can't persist anything |
-| Google OAuth | ✅ `signInWithOAuth()` | ❌ | **High** |
-| Session restore | ✅ `restoreSession()` via Supabase | ❌ | **High** |
-| Auth state sync | ✅ `onAuthStateChange()` | ⚠️ Listens for `BAYAN_AUTH_SYNC` postMessage but only works when user visits Bayan website with extension installed | **High** — unreliable |
-| Auth-gated features | ✅ Documents, sync, settings | ⚠️ UI elements exist but non-functional without auth | **High** |
-### AI Feature Comparison
-| Feature | Website UX | Extension UX | Parity? |
-|---------|-----------|-------------|---------|
-| Analyze (S+G+P) | Rich editor with inline highlights, suggestion sidebar, popover tooltip, apply/dismiss per-suggestion | **Content Script:** Overlay marks + tooltip. **Popup/SidePanel:** Textarea + suggestion cards | ⚠️ Functional but UX gap |
-| Summarize | Editor tab with radio buttons (short/medium/long) | Popup/SidePanel textarea with radio buttons | ✅ Near parity |
-| AutoComplete | **Ghost text** inside editor (Tab to accept) | SidePanel has a text box with "إكمال" button but NO inline ghost text on 3rd party sites | **Medium** — missing the core UX |
-| Dialect | Dedicated editor tab with "Convert" button | SidePanel tab with text box and "Convert" button | ✅ Near parity |
-| Quran | Dedicated editor tab with search | SidePanel tab with text box and search | ✅ Near parity |
-### Documents
-| Aspect | Website | Extension | Gap |
-|--------|---------|-----------|-----|
-| Create document | ✅ `createDocument()` | ⚠️ Button exists in SidePanel but blocked by no auth | **High** |
-| List documents | ✅ Desktop sidebar panel | ⚠️ `documentsList` in SidePanel workspace tab, blocked by no auth | **High** |
-| Save/auto-save | ✅ Debounced sync via `SyncManager` | ❌ | **High** |
-| Export PDF/DOCX | ✅ `export.js` | ❌ | **Medium** |
-| Import | ✅ `import.js` (TXT, DOCX) | ❌ | **Low** |
----
-## 4. Missing Features
-### Critical (Blocks Production)
-| # | Issue | Impact | Solution |
-|---|-------|--------|----------|
-| C1 | **`.env` file committed to Git** | Supabase URL and anon key are in the repo. While anon key is safe for client use, this is a security anti-pattern and may expose the project URL. | Remove `.env` from Git history, use HF Spaces secrets exclusively. `.gitignore` has `.env` but it was committed before the rule was added. |
-| C2 | **CORS wildcard `origins: "*"`** | Any website can call `/api/analyze`, `/api/summarize`, etc. directly. Abusers can drain compute. | Restrict CORS to `bayan10-bayan-api.hf.space` + extension origin `chrome-extension://<id>`. |
-| C3 | **No rate limiting on API** | No throttle on any endpoint. A single user can overwhelm the free-tier HF Space. | Add Flask-Limiter or simple in-memory token bucket. |
-### High (Important Feature Gap)
-| # | Issue | Impact | Solution |
-|---|-------|--------|----------|
-| H1 | Extension has no auth | Users cannot access cloud docs, settings, or history from extension | Implement Supabase auth in extension via `chrome.identity` or shared session from Bayan website |
-| H2 | Extension content script lacks AutoComplete ghost text | The flagship "ghost text" feature doesn't work on 3rd-party sites | Port `autocomplete.js` logic into `content-inline.js` with `/api/autocomplete` calls |
-| H3 | Extension popup/sidepanel have no export | Users cannot export corrected text as PDF/DOCX | Add "Copy as formatted text" or lightweight export |
-| H4 | No `documents` table migration | `supabase/migrations/001_profiles.sql` exists but no migration creates the `documents` table that `documents-api.js` uses | Create `002_documents.sql` migration |
-| H5 | Backend monolith: `app.py` is 2,844 lines | Extremely difficult to maintain, test, or extend | Split into `routes/`, `services/`, `middleware/` modules |
-### Medium (Improvement Needed)
-| # | Issue | Impact | Solution |
-|---|-------|--------|----------|
-| M1 | `src/js/api.js` uses ES module `export` syntax but is loaded via `<script>` tag (not `type="module"`) | The `api.js` exports are **never importable** — the website uses inline `fetch()` calls instead | Either convert to `type="module"` or remove the dead `export` statements |
-| M2 | Extension content script overlay doesn't handle `<iframe>` editors | Rich text editors in iframes (e.g., WordPress Gutenberg, TinyMCE) are invisible to the content script | Use `all_frames: true` in manifest or detect iframe editors |
-| M3 | Duplicated suggestion rendering logic | `ui.js` (website) and `bayan-ui.js` (extension) implement the same card HTML generation | Extract to shared package |
-| M4 | Extension `popup.js` (498 lines) and `sidepanel.js` (702 lines) share ~60% identical code | Maintenance nightmare — fixing a bug requires changes in 2+ files | Refactor into shared modules with UI-specific wrappers |
-| M5 | Grammar model uses Gradio proxy with SSE streaming | Creates a hard dependency on external `mohammedahmedezz2004-bayan-arabic-grammarly-correction.hf.space`. If that Space goes down, grammar breaks. | Host the grammar model directly on the Bayan Space, or add fallback |
-| M6 | No i18n framework on website | All strings are hardcoded in Arabic HTML. Adding English support requires rewriting HTML | Add simple i18n JSON loader (extension already has `_locales/ar/`) |
-### Low (Nice to Have)
-| # | Issue | Impact | Solution |
-|---|-------|--------|----------|
-| L1 | Extension only has Arabic locale | Cannot be published on Chrome Web Store for non-Arabic users | Add `_locales/en/messages.json` |
-| L2 | No analytics or telemetry | No visibility into usage patterns, error rates, or feature adoption | Add lightweight event tracking (privacy-respecting) |
-| L3 | Heavy vendor libraries loaded synchronously | `mammoth.browser.min.js`, `docx.umd.js`, `html2canvas.min.js` block initial render | Lazy-load on first export action |
-| L4 | No service worker for website | No offline caching for the web app | Add basic SW for static assets |
----
-## 5. Bugs Found
-| # | Bug | Severity | Location | Status |
-|---|-----|----------|----------|--------|
-| B1 | `ENABLE_AUTOCOMPLETE_MODEL = False` in `app.py:62` | Medium | `app.py` line 62 | AutoComplete model disabled by default — `/api/autocomplete` still works via lazy-loading, but the flag is misleading |
-| B2 | `src/js/api.js` uses `export` keyword but is not loaded as ES module | Low | `api.js` | Dead code — never actually imported anywhere |
-| B3 | Extension `bayan-api.js` missing functions `bayanAutocomplete`, `bayanDialect`, `bayanQuran` | High | `bayan-api.js` only defines `bayanAnalyze`, `bayanSummarize`, `bayanHealthCheck` | SidePanel calls these undefined functions — will throw `ReferenceError` |
-| B4 | Extension content script overlay position breaks on page scroll (absolute vs fixed positioning) | Medium | `content-inline.js:191` | Overlay uses `window.scrollY` but doesn't update on window resize |
-| B5 | Score sparkline renders with only 2 data points creating a meaningless line | Low | `format.js` | ✅ Fixed (raised minimum to 3 points) |
-| B6 | `dismissAllFiltered()` only removed DOM elements without updating `window.currentSuggestions` | Medium | `format.js` | ✅ Fixed |
----
-## 6. Security Issues
-| # | Issue | Severity | Details |
-|---|-------|----------|---------|
-| S1 | **`.env` committed to repo** | **Critical** | Supabase URL + anon key visible in Git history. While anon keys are designed for client-side use, the URL+key combo allows anyone to make Supabase API calls. |
-| S2 | **CORS `origins: "*"`** | **Critical** | `app.py:94` — allows any origin to call all API endpoints. Enables: (a) compute theft, (b) DDoS via free proxy, (c) third-party scraping. |
-| S3 | **No API authentication** | **High** | No JWT, API key, or session check on any endpoint. Extension uses only `host_permissions` scoping. |
-| S4 | **XSS risk in editor** | **Medium** | `setEditorHTML()` injects HTML directly into contenteditable. While `renderer.js` escapes text, any upstream bug in suggestion rendering could inject arbitrary HTML. |
-| S5 | **Supabase RLS incomplete** | **Medium** | Only `profiles` has RLS policies. The `documents` table (if exists) needs RLS to prevent cross-user data access. |
-| S6 | **Extension Trusted Types partial** | **Low** | `content-inline.js` implements `trustedTypes.createPolicy()` with identity transform (`input => input`), which passes the CSP check but provides no actual sanitization. |
-| S7 | **Debug endpoint exposed** | **Low** | `/api/debug-models` is accessible in production and leaks internal model status, memory usage, and startup errors. |
----
-## 7. Performance Issues
-| # | Issue | Severity | Details |
-|---|-------|----------|---------|
-| P1 | **`app.py` is 2,844 lines** | High | Single-file monolith. Every request loads all imports. Cold start on HF Spaces free tier takes ~60s. |
-| P2 | **Vendor JS loaded synchronously** | Medium | `mammoth.browser.min.js` (340KB), `docx.umd.js` (1.2MB), `html2canvas.min.js` (210KB) all load on page start even if never used. |
-| P3 | **Extension content script injected on ALL sites** | Medium | `matches: ["https://*/*", "http://*/*"]` — runs on every page. The `BayanController` module loads even on sites where user never types Arabic. |
-| P4 | **No API response caching on website** | Medium | Every keystroke after debounce triggers a full `/api/analyze` call. Extension has background worker caching, but website doesn't. |
-| P5 | **Grammar Gradio SSE dependency** | Medium | Grammar correction requires streaming from external HF Space. Average latency: 3-8 seconds. Adds significant delay to the analysis pipeline. |
-| P6 | **Quran DB is 23MB** | Low | `quran_master.db` (SQLite, 23MB) is loaded into the Docker container. Fine for now, but limits scaling. |
-| P7 | **No CSS/JS minification** | Low | All assets served unminified. `components.css` alone is 4,125+ lines (~90KB). |
----
-## 8. UX Problems
-| # | Issue | Severity | Details |
-|---|-------|----------|---------|
-| U1 | **Extension content script tooltip clips at viewport edge** | Medium | Tooltip for highlighted errors can overflow off-screen on narrow viewports. No boundary detection. |
-| U2 | **No loading skeleton on website** | Medium | Editor page shows blank white space during model initialization. No skeleton/shimmer to indicate loading. |
-| U3 | **Extension popup has no dialect/quran/autocomplete** | Medium | Only "تصحيح" and "تلخيص" tabs. SidePanel has all features, but popup is the first surface users see. |
-| U4 | **Inconsistent branding between popup and sidepanel** | Low | Popup uses `.bayan-*` class prefix, SidePanel uses `.sp-*` prefix. Different color palettes. |
-| U5 | **No onboarding flow** | Low | First-time users see an empty editor with no guidance. No tooltips, walkthrough, or sample text. |
-| U6 | **Mobile responsiveness incomplete** | Low | Website has responsive breakpoints but bottom-sheet for suggestions lacks smooth gestures. |
----
-## 9. Technical Debt
-### Backend
-| Item | Severity | Details |
-|------|----------|---------|
-| **Monolith `app.py`** | High | 2,844 lines. Contains routes, NLP logic, model loading, diffing algorithms, offset mapping, pipeline orchestration, Quran search integration, and CORS — all in one file. |
-| **Duplicated directional blocks** | Medium | `_DIRECTIONAL_BLOCKS` in `app.py` duplicates logic that also exists in `araspell_rules.py`. |
-| **12+ test files at project root** | Low | `test_proof.py`, `test_sv.py`, `test_pc.py`, etc. scattered in root instead of `tests/`. |
-| **Dead code** | Low | `ENABLE_DIALECT_MODEL = False`, `ENABLE_AUTOCOMPLETE_MODEL = False` flags in `app.py` — no code path checks them for these features since they use lazy-loading. |
-| **Archive directory** | Low | `archive/legacy_scripts/` contains old code that shouldn't ship in Docker image. |
-### Frontend (Website)
-| Item | Severity | Details |
-|------|----------|---------|
-| **`api.js` dead exports** | Medium | `export async function analyzeText()` — never imported. Website uses inline `fetch()` in `editor.js`. |
-| **Tight coupling in `editor.js`** | Medium | DOM manipulation, API calls, suggestion management, and UI updates all in one 29KB file. |
-| **No build system** | Low | No bundler, no tree-shaking, no code-splitting. All JS loaded via `<script>` tags. |
-| **CSS structure** | Low | Single `components.css` at 4,125+ lines. No CSS modules, no scoping. |
-### Extension
-| Item | Severity | Details |
-|------|----------|---------|
-| **`popup.js` and `sidepanel.js` code duplication** | High | ~60% identical code: `updateCounts()`, `markStale()`, `setLoading()`, `updateScore()`, `renderSuggestions()`, `showToast()`. |
-| **Missing API functions in `bayan-api.js`** | High | SidePanel calls `bayanAutocomplete()`, `bayanDialect()`, `bayanQuran()` which are not defined in `bayan-api.js`. These must be defined elsewhere or will throw. |
-| **No TypeScript / JSDoc validation** | Low | All extension code is plain JS with no compile-time checking. |
----
-## 10. Recommended Roadmap
-### Phase 1: Security Hardening ⚡ (Critical — Before Any Growth)
-**Timeline: 1-2 days**
-1. **Remove `.env` from Git history** — `git filter-branch` or BFG Repo Cleaner
-2. **Restrict CORS** — Change `origins: "*"` to allowlist `["https://bayan10-bayan-api.hf.space", "chrome-extension://<ext-id>"]`
-3. **Add rate limiting** — Flask-Limiter: 30 req/min per IP for `/api/analyze`, 10 req/min for `/api/summarize`
-4. **Disable debug endpoint in production** — Guard `/api/debug-models` behind `app.debug` flag
-5. **Add Supabase RLS for `documents` table** — `CREATE POLICY ... USING (auth.uid() = user_id)`
-### Phase 2: Extension Auth Unification 🔐 (High)
-**Timeline: 3-5 days**
-1. **Implement Supabase client in extension** — Add `@supabase/supabase-js` as UMD bundle in `shared/`
-2. **Auth flow**: Use `chrome.identity.launchWebAuthFlow()` for Google OAuth → receive tokens → init Supabase session
-3. **Session persistence**: Store refresh token in `chrome.storage.local`
-4. **Auth sync**: When user logs in on website, broadcast via `postMessage` → content script → `chrome.storage`
-5. **Result**: Extension users can access their documents, settings, and history
-### Phase 3: Extension Feature Parity 🔧 (High)
-**Timeline: 3-5 days**
-1. **Add missing API functions** to `bayan-api.js`: `bayanAutocomplete()`, `bayanDialect()`, `bayanQuran()`
-2. **Add autocomplete/dialect/quran tabs to popup** (currently SidePanel-only)
-3. **Inline ghost text for content script** — Port `autocomplete.js` logic for textareas on 3rd-party sites
-4. **Add basic export** — "Copy corrected text" button already exists; add "Download as TXT"
-### Phase 4: Backend Refactoring 🏗️ (Medium)
-**Timeline: 5-7 days**
-1. **Split `app.py`** into:
-   - `routes/analyze.py`, `routes/summarize.py`, `routes/dialect.py`, `routes/quran.py`
-   - `services/pipeline.py` (orchestration)
-   - `middleware/cors.py`, `middleware/rate_limit.py`
-2. **Create `002_documents.sql` migration** with proper RLS
-3. **Move root-level test files** into `tests/`
-4. **Remove `archive/` from Docker build** (add to `.dockerignore`)
-### Phase 5: Extension Code Quality 🧹 (Medium)
-**Timeline: 3-4 days**
-1. **Extract shared logic** from `popup.js` and `sidepanel.js` into `shared/bayan-core.js`
-2. **Add English locale** `_locales/en/messages.json`
-3. **Add `all_frames: true`** to manifest for iframe editor support
-4. **Add theme toggle** to popup and sidepanel
-### Phase 6: Performance & Polish ✨ (Low)
-**Timeline: 2-3 days**
-1. **Lazy-load vendor libs** (mammoth, docx, html2canvas) on first use
-2. **Add website-side API caching** (localStorage TTL cache like extension has)
-3. **Add CSS/JS minification** to Docker build
-4. **Add loading skeletons** for editor page
-5. **Add onboarding flow** — sample text + guided tooltips
----
-## Summary Matrix
-| Category | Critical | High | Medium | Low | Total |
-|----------|---------|------|--------|-----|-------|
-| **Security** | 2 (S1, S2) | 1 (S3) | 2 (S4, S5) | 2 (S6, S7) | 7 |
-| **Missing Features** | 0 | 5 (H1-H5) | 6 (M1-M6) | 4 (L1-L4) | 15 |
-| **Bugs** | 0 | 1 (B3) | 2 (B1, B4) | 1 (B2) | 4 (+2 fixed) |
-| **Performance** | 0 | 1 (P1) | 4 (P2-P5) | 2 (P6, P7) | 7 |
-| **UX** | 0 | 0 | 3 (U1-U3) | 3 (U4-U6) | 6 |
-| **Tech Debt** | 0 | 3 | 5 | 5 | 13 |
-| **TOTAL** | **2** | **11** | **22** | **17** | **52** |
----
-## Final Verdict
-Bayan is a technically impressive product with a solid NLP pipeline, a mature editor engine, and a well-architected extension. The core correction features (Spelling → Grammar → Punctuation) work end-to-end across both surfaces.
-**What Bayan does well:**
-- ✅ Custom contenteditable editor with proper cursor handling
-- ✅ Multi-stage NLP pipeline with offset mapping
-- ✅ Extension uses overlay-only rendering (never modifies user DOM)
-- ✅ Supabase integration for cloud persistence
-- ✅ Comprehensive test coverage (16 backend test files)
-- ✅ Extension follows MV3 best practices (service worker, side panel)
-**What must be fixed before growth:**
-1. 🔴 **Security**: CORS wildcard + no rate limiting = anyone can abuse the API
-2. 🔴 **Auth gap**: Extension users can't persist anything — breaks the SaaS value proposition
-3. 🟡 **Extension missing API functions**: `bayanAutocomplete/Dialect/Quran` will throw `ReferenceError`
-4. 🟡 **Backend monolith**: 2,844-line `app.py` is a maintenance bottleneck
-**Bottom line:** Bayan is 80% of the way to a production-grade SaaS product. The remaining 20% is security hardening, extension auth, and code architecture — all achievable in 2-3 focused weeks.

Dockerfile CHANGED Viewed

@@ -77,6 +77,47 @@ COPY quran.py ./
 COPY quran_master.db ./
 COPY .env* ./
 # Set environment variables
 ENV PORT=7860
 ENV DEBUG=False

 COPY quran_master.db ./
 COPY .env* ./
+# Minify JS/CSS for production
+RUN pip install --no-cache-dir rjsmin rcssmin && \
+    python -c "\
+import os, rjsmin, rcssmin; \
+for root, dirs, files in os.walk('src'): \
+    for f in files: \
+        p = os.path.join(root, f); \
+        if f.endswith('.js'): \
+            with open(p) as fh: src = fh.read(); \
+            with open(p, 'w') as fh: fh.write(rjsmin.jsmin(src)); \
+        elif f.endswith('.css'): \
+            with open(p) as fh: src = fh.read(); \
+            with open(p, 'w') as fh: fh.write(rcssmin.cssmin(src)); \
+"
+# Bundle JS files in dependency order (replaces 33 script tags)
+RUN python -c "\
+import os; \
+js_order = [ \
+    'js/vendor/supabase.min.js', 'js/auth/config.js', 'js/vendor-loader.js', \
+    'js/auth/client.js', 'js/auth/session.js', 'js/auth/auth.js', 'js/auth/auth-ui.js', \
+    'js/theme.js', 'js/vendor/FileSaver.min.js', 'js/dialogs.js', 'js/i18n.js', \
+    'js/analytics.js', 'js/onboarding.js', 'js/renderer.js', 'js/selection.js', \
+    'js/ui.js', 'js/documents/doc-utils.js', 'js/editor.js', 'js/autocomplete.js', \
+    'js/format.js', 'js/documents/import.js', 'js/documents/export.js', \
+    'js/documents/documents.js', 'js/sync/sync-queue.js', 'js/sync/sync-resolver.js', \
+    'js/sync/sync-manager.js', 'js/documents-cloud/documents-api.js', \
+    'js/documents-cloud/documents-state.js', 'js/documents-cloud/documents-ui.js', \
+    'js/summaries/summaries-api.js', 'js/summaries/summaries-ui.js', \
+    'js/settings-sync/settings-api.js', 'js/settings-sync/settings-sync.js', \
+    'js/app.js', \
+]; \
+bundle = ''; \
+for f in js_order: \
+    p = os.path.join('src', f); \
+    if os.path.exists(p): \
+        with open(p) as fh: bundle += fh.read() + '\n'; \
+with open('src/js/bayan.bundle.js', 'w') as fh: fh.write(bundle); \
+print(f'Bundled {len(js_order)} JS files'); \
+"
 # Set environment variables
 ENV PORT=7860
 ENV DEBUG=False

PROJECT_DESCRIPTION.md CHANGED Viewed

@@ -11,7 +11,7 @@ Bayan/
 ├── data/                       # Directory for raw and processed datasets (empty by default)
 ├── models/                     # Deep learning models directory (organized by task)
 │   ├── Autocomplete/           # GPT-2 autocomplete model
-│   ├── Grammrar/               # Gemma-based grammar correction model
 │   ├── Punctuation/            # Seq2Seq punctuation correction model
 │   ├── Spelling/               # BERT-based spelling corrector checkpoint
 │   └── Summarization/          # mBART summarization model checkpoint
@@ -199,7 +199,7 @@ Verify that you have placed the model files under the `models/` directory:
 - Summarization: `models/Summarization/Model/`
 - Spelling: `models/Spelling/Model/`
 - Autocomplete: `models/Autocomplete/Model/`
-- Grammar: `models/Grammrar/Model/`
 - Punctuation: `models/Punctuation/Model/`
 ### 3. Run the Server

 ├── data/                       # Directory for raw and processed datasets (empty by default)
 ├── models/                     # Deep learning models directory (organized by task)
 │   ├── Autocomplete/           # GPT-2 autocomplete model
+│   ├── Grammar/                # Gemma-based grammar correction model
 │   ├── Punctuation/            # Seq2Seq punctuation correction model
 │   ├── Spelling/               # BERT-based spelling corrector checkpoint
 │   └── Summarization/          # mBART summarization model checkpoint
 - Summarization: `models/Summarization/Model/`
 - Spelling: `models/Spelling/Model/`
 - Autocomplete: `models/Autocomplete/Model/`
+- Grammar: `models/Grammar/Model/`
 - Punctuation: `models/Punctuation/Model/`
 ### 3. Run the Server

QUICKSTART.md DELETED Viewed

@@ -1,126 +0,0 @@
-# Bayan - Quick Start Guide
-## 🚀 Quick Start
-### 1. Install Dependencies
-```bash
-pip install -r requirements.txt
-```
-**Note:** If you have issues, install PyTorch separately:
-- CPU: `pip install torch --index-url https://download.pytorch.org/whl/cpu`
-- GPU: Visit https://pytorch.org/get-started/locally/
-### 2. Run the Application
-```bash
-python run_app.py
-```
-### 3. Open in Browser
-Navigate to: **http://localhost:5000**
-## 📁 Project Structure
-```
-Bayan/
-├── src/
-│   ├── app.py              # Flask backend server
-│   ├── model_loader.py     # Model loading and inference
-│   └── index.html          # Web interface
-├── models/
-│   └── arabic_summarization_model/
-│       └── content/drive/MyDrive/arabic_summarization_model/
-│           ├── config.json
-│           ├── model.safetensors
-│           └── ... (other model files)
-├── run_app.py              # Application launcher
-├── requirements.txt         # Python dependencies
-└── README_SETUP.md         # Detailed setup guide
-```
-## 🔧 Features
-✅ **Robust Error Handling**
-- Path validation for model files
-- Graceful fallbacks if model loading fails
-- Input validation and sanitization
-- Clear error messages
-✅ **Security**
-- Input length limits (max 5000 characters)
-- CORS enabled for web interface
-- Safe model loading
-- Error logging
-✅ **User Experience**
-- Loading indicators
-- Real-time feedback
-- Arabic language support
-- Responsive design
-## 🧪 Testing
-### Test API Health
-```bash
-curl http://localhost:5000/api/health
-```
-### Test Summarization
-```bash
-curl -X POST http://localhost:5000/api/summarize \
-  -H "Content-Type: application/json" \
-  -d '{"text": "نص تجريبي للاختبار", "length": 2, "full_text": true}'
-```
-## 🐛 Troubleshooting
-### Model Not Found
-- Verify model path: `models/arabic_summarization_model/content/drive/MyDrive/arabic_summarization_model/`
-- Check that `config.json` exists
-- The app will search multiple possible locations automatically
-### Dependencies Missing
-```bash
-python check_dependencies.py
-pip install -r requirements.txt
-```
-### Port Already in Use
-```bash
-set PORT=5001
-python run_app.py
-```
-## 📝 API Documentation
-### POST /api/summarize
-Summarize Arabic text.
-**Request:**
-```json
-{
-  "text": "النص العربي...",
-  "length": 2,  // 1=short, 2=medium, 3=long
-  "full_text": true
-}
-```
-**Response:**
-```json
-{
-  "status": "success",
-  "summary": "الملخص...",
-  "original_length": 500,
-  "summary_length": 150
-}
-```
-## 🎯 Next Steps
-1. Install dependencies: `pip install -r requirements.txt`
-2. Run the app: `python run_app.py`
-3. Open browser: http://localhost:5000
-4. Write Arabic text and click "توليد الملخص"
-For detailed information, see `README_SETUP.md`.

README_SETUP.md DELETED Viewed

@@ -1,172 +0,0 @@
-# Bayan - Arabic Text Summarization Setup Guide
-## Overview
-Bayan is an Arabic text summarization application with a web interface. This guide will help you set up and run the application.
-## Prerequisites
-- Python 3.8 or higher
-- pip (Python package manager)
-- At least 4GB RAM (8GB+ recommended for better performance)
-- Model files in the correct location (see below)
-## Installation Steps
-### 1. Install Dependencies
-```bash
-pip install -r requirements.txt
-```
-**Note:** If you encounter issues installing PyTorch, you may need to install it separately:
-- For CPU: `pip install torch --index-url https://download.pytorch.org/whl/cpu`
-- For CUDA: Visit https://pytorch.org/get-started/locally/ for the appropriate command
-### 2. Verify Model Location
-The model should be located at:
-```
-models/arabic_summarization_model/content/drive/MyDrive/arabic_summarization_model/
-```
-Required files:
-- `config.json`
-- `tokenizer.json`
-- `model.safetensors`
-- `sentencepiece.bpe.model`
-- Other tokenizer/model files
-### 3. Run the Application
-#### Option A: Using the run script (Recommended)
-```bash
-python run_app.py
-```
-#### Option B: Direct Flask run
-```bash
-cd src
-python app.py
-```
-#### Option C: Using Flask CLI
-```bash
-cd src
-export FLASK_APP=app.py
-flask run
-```
-### 4. Access the Application
-Open your browser and navigate to:
-```
-http://localhost:5000
-```
-## Configuration
-### Environment Variables
-- `PORT`: Server port (default: 5000)
-- `DEBUG`: Enable debug mode (default: False)
-  ```bash
-  export DEBUG=True
-  export PORT=8080
-  ```
-### Supabase Authentication (Phase 5)
-See `.env.example` and `PHASE_5_IMPLEMENTATION_PLAN.md`.
-1. Create a Supabase project and enable **Anonymous** + **Google** auth.
-2. Run `supabase/migrations/001_profiles.sql` in the SQL Editor.
-3. Set meta tags in `src/index.html`:
-   ```html
-   <meta name="supabase-url" content="https://YOUR_PROJECT.supabase.co">
-   <meta name="supabase-anon-key" content="YOUR_ANON_KEY">
-   ```
-4. Add redirect URL: `http://localhost:5000/**`
-If Supabase is not configured, the editor still works in offline auth mode.
-### Model Not Found Error
-If you see "Model not found" error:
-1. Verify the model path exists
-2. Check that all required files are present
-3. The application will search multiple possible paths automatically
-### Out of Memory Error
-If you encounter memory issues:
-1. Close other applications
-2. Use CPU mode (it will automatically use CPU if CUDA is not available)
-3. Reduce the `MAX_TEXT_LENGTH` in `src/app.py` if needed
-### Port Already in Use
-If port 5000 is already in use:
-```bash
-export PORT=5001
-python run_app.py
-```
-### Slow Performance
-- First run will be slower as the model loads
-- Subsequent requests will be faster
-- Using GPU (CUDA) significantly improves performance
-## API Endpoints
-### Health Check
-```
-GET /api/health
-```
-Returns server status and model loading state.
-### Summarize Text
-```
-POST /api/summarize
-Content-Type: application/json
-{
-  "text": "النص العربي المراد تلخيصه...",
-  "length": 2,  // 1=short, 2=medium, 3=long
-  "full_text": true
-}
-```
-Response:
-```json
-{
-  "status": "success",
-  "summary": "الملخص المولد...",
-  "original_length": 500,
-  "summary_length": 150
-}
-```
-## Security Features
-- Input validation (text length limits)
-- CORS enabled for web interface
-- Error handling and logging
-- Path validation for model files
-- Safe model loading with fallbacks
-## Development
-### Running in Debug Mode
-```bash
-export DEBUG=True
-python run_app.py
-```
-### Testing the API
-```bash
-curl -X POST http://localhost:5000/api/summarize \
-  -H "Content-Type: application/json" \
-  -d '{"text": "نص تجريبي للاختبار", "length": 2, "full_text": true}'
-```
-## Support
-For issues or questions:
-1. Check the logs in the terminal
-2. Verify model files are correct
-3. Ensure all dependencies are installed
-4. Check Python version compatibility

analyze_failures.py DELETED Viewed

@@ -1,67 +0,0 @@
-"""Analyze remaining 24 failures after Layer 1/2/3 fixes."""
-import json, re
-with open('tests/phase10/reports/collision_benchmark_results.json', 'r', encoding='utf-8') as f:
-    data = json.load(f)
-def norm(t):
-    t = re.sub(r'[\u064B-\u065F\u0670]', '', t)
-    t = t.rstrip('.،؛؟!?!')
-    return re.sub(r'\s+', ' ', t).strip()
-categories = {}
-for r in data['results']:
-    if r['pipeline_verdict'] != 'FN':
-        continue
-    rid = r['id']
-    exp = r['expected'].strip()
-    act = r['pipeline_output'].strip()
-    inp = r['input'].strip()
-    inp_w = inp.split()
-    exp_w = exp.split()
-    act_w = act.split()
-    issues = []
-    for i in range(min(len(exp_w), len(act_w))):
-        aw = act_w[i].rstrip('.،؛؟!?!')
-        ew = exp_w[i].rstrip('.،؛؟!?!')
-        iw = inp_w[i] if i < len(inp_w) else '—'
-        aw_n = re.sub(r'[\u064B-\u065F]', '', aw)
-        ew_n = re.sub(r'[\u064B-\u065F]', '', ew)
-        if aw_n == ew_n:
-            continue  # tanween/diacritic only diff
-        if aw != ew:
-            if iw == aw:
-                cause = "MODEL_MISS"
-            elif iw == ew:
-                cause = "CORRUPTED"
-            else:
-                cause = "WRONG_FIX"
-            issues.append(f"    [{i}] '{iw}'→'{aw}' (exp:'{ew}') {cause}")
-    if len(exp_w) != len(act_w):
-        issues.append(f"    word count: {len(act_w)} vs {len(exp_w)}")
-    # Classify
-    has_junk = any('وومن' in a or '.و' in a or 'ةل' in a for a in act_w)
-    has_trailing_و = any(a.endswith('و') and not e.endswith('و') and not e.endswith('وا')
-                         for a, e in zip(act_w, exp_w) if a != e)
-    cat = r['category']
-    print(f"\n{rid} [{cat}]")
-    print(f"  IN:  {inp[:60]}")
-    print(f"  EXP: {exp[:60]}")
-    print(f"  ACT: {act[:60]}")
-    for iss in issues:
-        print(iss)
-    if has_junk:
-        print("  >>> TRAILING JUNK")
-# Summary of what each failure needs
-print("\n" + "="*60)
-print("FIXABILITY ANALYSIS")
-print("="*60)
-print(f"\nTotal failures: 24")
-print(f"Need: 17 more passes to reach 85% (43/50)")

apply_locks.py DELETED Viewed

@@ -1,77 +0,0 @@
-import os
-def apply_lock_to_file(filepath, var_name, engine_name, func_name):
-    with open(filepath, 'r', encoding='utf-8') as f:
-        lines = f.readlines()
-    out_lines = []
-    in_imports = False
-    added_threading = False
-    in_globals = False
-    added_lock_var = False
-    in_func = False
-    for line in lines:
-        if line.startswith('import ') and not added_threading:
-            out_lines.append(line)
-            out_lines.append("import threading\n")
-            added_threading = True
-            continue
-        if line.startswith(f'_{var_name} = None') and not added_lock_var:
-            out_lines.append(line)
-            out_lines.append(f"_load_lock = threading.Lock()\n")
-            added_lock_var = True
-            continue
-        if line.startswith(f'def {func_name}('):
-            in_func = True
-            out_lines.append(line)
-            continue
-        if in_func:
-            if line.startswith(f'    global '):
-                out_lines.append(line.replace('\n', f', _load_lock\n'))
-                continue
-            if line.startswith(f'    try:'):
-                # The start of the old try block. We wrap everything from here.
-                out_lines.append(f'    with _load_lock:\n')
-                out_lines.append(f'        if _{var_name} is not None:\n')
-                out_lines.append(f'            return _{var_name}\n\n')
-                out_lines.append(f'        try:\n')
-                continue
-            # If we are inside the function and past the global declaration,
-            # and it's indented with at least 4 spaces, we need to add 4 more spaces
-            # for the lines that were inside the old `try:` and `except:`
-            # EXCEPT for `if _xxx is not None: return _xxx` which comes before the try
-            if line.startswith('    if _') or line.startswith('        return _'):
-                # This is the old `if checker is not None:` logic before try. Leave it alone.
-                out_lines.append(line)
-                continue
-            if line.startswith('    '):
-                # Shift everything that was inside try/except right by 4 spaces
-                if line.strip() == '':
-                    out_lines.append('\n')
-                else:
-                    out_lines.append('    ' + line)
-                if line.startswith('    return _') or line.startswith('    raise RuntimeError'):
-                    # End of function
-                    in_func = False
-                continue
-        out_lines.append(line)
-    with open(filepath, 'w', encoding='utf-8') as f:
-        f.writelines(out_lines)
-apply_lock_to_file(r'src/nlp/spelling/araspell_service.py', 'spell_checker', 'AraSpell', 'get_spelling_model')
-apply_lock_to_file(r'src/nlp/punctuation/punctuation_service.py', 'punctuation_checker', 'PuncAra', 'get_punctuation_model')
-apply_lock_to_file(r'src/nlp/grammar/grammar_service.py', 'grammar_checker', 'Grammar', 'get_grammar_model')
-apply_lock_to_file(r'src/nlp/autocomplete/autocomplete_service.py', 'autocomplete_engine', 'Autocomplete', 'get_autocomplete_model')
-print("Locks applied perfectly with correct indentation!")

archive/BAYAN_COMPLETE_AUDIT.md ADDED Viewed

	@@ -0,0 +1,510 @@

+# BAYAN — Complete Product, Codebase & Extension Deep Audit
+> **Audit Date:** 2026-06-27
+> **Auditor Perspective:** Product Manager + Senior Frontend + Backend Architect + Extension Engineer + SaaS Reviewer
+> **Scope:** Website, Backend API, Chrome Extension, Auth/Database, AI Models, UX, Security, Performance, Code Quality
+---
+## 1. Current System Overview
+### Architecture Map
+```
+┌────────────────────────────────────────────────────────────────┐
+│                       BAYAN ECOSYSTEM                          │
+│                                                                │
+│  ┌──────────────┐    ┌──────────────┐    ┌─────────────────┐  │
+│  │  Website SPA  │───▶│  Flask API   │───▶│  NLP Pipeline   │  │
+│  │ (index.html)  │    │  (app.py)    │    │ Spell→Gram→Punct│  │
+│  │ 33 JS files   │    │  2,844 lines │    │ PipelineContext  │  │
+│  └──────┬───────┘    └──────┬───────┘    │ PatchSet/Locker  │  │
+│         │                   │            └─────────────────┘  │
+│         │                   │                                  │
+│         │            ┌──────┴───────┐    ┌─────────────────┐  │
+│         │            │ Local Models  │    │ Remote Grammar  │  │
+│         │            │ Spelling      │    │ (Gradio Space)  │  │
+│         │            │ Punctuation   │    │ Latency: 3-8s   │  │
+│         │            │ Summarization │    └─────────────────┘  │
+│         │            │ Dialect (mT5) │                         │
+│         │            │ Autocomplete  │                         │
+│         │            └──────────────┘                          │
+│         │                                                      │
+│  ┌──────┴───────┐    ┌──────────────┐    ┌─────────────────┐  │
+│  │   Supabase    │◀──│   Auth Module │──▶│ Documents DB     │  │
+│  │   (Cloud)     │   │ Guest+Google  │   │ Settings Sync    │  │
+│  │   Client-side │   │ PKCE OAuth    │   │ Summaries        │  │
+│  └──────────────┘    └──────────────┘    └─────────────────┘  │
+│                                                                │
+│  ┌────────────────────────────────────────────────────────┐    │
+│  │              Chrome Extension (MV3 v2.1.0)             │    │
+│  │  ┌───────────┐ ┌────────────┐ ┌─────────────────────┐ │    │
+│  │  │ Content   │ │ Background │ │  Side Panel + Popup  │ │    │
+│  │  │ Script    │ │  Worker    │ │  5 tabs each         │ │    │
+│  │  │ Overlay+  │ │  Cache+    │ │  Correct/Summarize/  │ │    │
+│  │  │ Ghost txt │ │  Retry     │ │  Dialect/Quran/Auto  │ │    │
+│  │  └───────────┘ └────────────┘ └─────────────────────┘ │    │
+│  │  NO AUTH │ NO DOCUMENTS │ NO SYNC │ NO EXPORT          │    │
+│  └────────────────────────────────────────────────────────┘    │
+└────────────────────────────────────────────────────────────────┘
+```
+### Technology Stack
+| Layer | Technology | Notes |
+|-------|-----------|-------|
+| **Frontend** | Vanilla JS, HTML, CSS (Tailwind CDN dev mode) | Custom `contenteditable` editor, 33 script tags, no bundler |
+| **Backend** | Flask (Python) | Single monolith `app.py` — 2,844 lines |
+| **NLP Pipeline** | Custom Python modules | 3-stage: Spelling → Grammar → Punctuation with PipelineContext/PatchSet/StageLocker |
+| **AI Models** | 6 transformer-based models | Spelling (AraSpell), Grammar (remote Gradio), Punctuation (PuncAra-v1), Summarization (mBART), Dialect (mT5-300M), Autocomplete (AraBERT + AraGPT2) |
+| **Database** | Supabase (PostgreSQL) | Documents, profiles, settings, summaries — all client-side only |
+| **Auth** | Supabase Auth (PKCE) | Guest (anonymous) + Google OAuth, 8s timeout + offline fallback |
+| **Deployment** | HuggingFace Spaces (Docker) | CPU-only free tier, ~60s cold start |
+| **Extension** | Chrome MV3 | Background SW, Content Script (all sites), Side Panel, Popup |
+### File Structure Summary
+| Directory | Files | Purpose |
+|-----------|-------|---------|
+| `src/` | `app.py`, `hf_inference.py`, `model_loader.py` + HTML/CSS | Backend + serving |
+| `src/js/` | 8 core JS files | Editor, renderer, selection, UI, theme, format, autocomplete, api |
+| `src/js/auth/` | 5 files | Supabase auth (config, client, session, auth, auth-ui) |
+| `src/js/documents/` | 4 files | Local doc management (documents, doc-utils, export, import) |
+| `src/js/documents-cloud/` | 3 files | Supabase CRUD (api, state, ui) |
+| `src/js/sync/` | 3 files | Offline queue (manager, queue, resolver) |
+| `src/js/settings-sync/` | 2 files | User settings cloud persistence |
+| `src/js/summaries/` | 2 files | Cloud summaries (api, ui) |
+| `src/nlp/` | 6 subdirs | All NLP processing modules |
+| `extension/` | 8 files + 4 subdirs | Chrome Extension |
+| `extension/shared/` | 9 files | Shared utilities (api, renderer, patches, state, hash, ui, config, constants, analysis-controller) |
+| `extension/sidepanel/` | 3 files | Side panel (HTML, JS, CSS) |
+| `tests/` | 16+ test files | Backend unit/integration tests |
+| `extension/tests/` | 8 files | Extension integration tests |
+### NLP Pipeline Architecture
+```
+User Input → PipelineContext(text)
+    │
+    ├─[1] SPELLING  (if text ≤ 1000 chars && not religious && not URLs/hashtags)
+    │   AraSpell seq2seq + beam search (5 beams)
+    │   10-step postprocessing: hybrid alignment, MLM validation, bidirectional check
+    │   20+ safety guards (edit distance, length ratio, first-letter, numeral, pronoun suffix...)
+    │   ctx.mutate_text() → OffsetMapper chain
+    │
+    ├─[2] GRAMMAR   (if not religious text)
+    │   Remote Gradio API → mohammedahmedezz2004/bayan_arabic_grammarly_correction
+    │   ArabicGrammarGuard: 14 rule-based post-passes (camel-tools MLE disambiguator)
+    │   Jaccard hallucination filter, directional blocks, 10+ safety guards
+    │   StageLocker hierarchy: grammar(3) > spelling(2) > punctuation(1)
+    │   ctx.mutate_text() → OffsetMapper chain
+    │
+    ├─[3] PUNCTUATION (if not religious && spelling+grammar made corrections)
+    │   PuncAra-v1 local model (50-word chunks, beam=3)
+    │   validate_punctuation_diff() safety layer
+    │   Max 3 punctuation patches cap
+    │   ctx.mutate_text() → OffsetMapper chain
+    │
+    └─ PatchSet.resolve_overlaps() → API Response
+       Deterministic greedy: priority DESC, confidence DESC, start ASC
+```
+---
+## 2. Feature Inventory
+### Core AI Features
+| Feature | Backend API | Website | Extension | Key Files |
+|---------|------------|---------|-----------|-----------|
+| **Spelling** | ✅ `/api/spelling` + `/api/analyze` | ✅ Inline highlights, suggestions, apply/dismiss | ✅ Content script overlay + Popup + SidePanel | `nlp/spelling/araspell_service.py`, `araspell_rules.py` |
+| **Grammar** | ✅ `/api/grammar` + `/api/analyze` | ✅ Via remote Gradio proxy + 14 rule-based postprocessors | ✅ Content script overlay + Popup + SidePanel | `nlp/grammar/grammar_service.py`, `grammar_rules.py` |
+| **Punctuation** | ✅ `/api/punctuation` + `/api/analyze` | ✅ PuncAra-v1 local model | ✅ Content script overlay + Popup + SidePanel | `nlp/punctuation/punctuation_service.py` |
+| **Summarization** | ✅ `/api/summarize` | ✅ Editor tab with length slider + paragraph/bullets mode | ✅ Popup tab + SidePanel tab | `model_loader.py`, `summaries-api.js` |
+| **Autocomplete** | ✅ `/api/autocomplete` | ✅ Ghost text + dropdown, word-boundary triggered | ⚠️ Ghost text for textarea/input only, button-click in popup/sidepanel | `autocomplete.js`, `content-inline.js` |
+| **Dialect→MSA** | ✅ `/api/dialect` | ✅ Dedicated editor tab | ✅ Popup + SidePanel tabs | `nlp/dialect/dialect_service.py` |
+| **Quran Verification** | ✅ `/api/quran` | ✅ Dedicated editor tab + 13-language translation | ✅ SidePanel (with translation), Popup | `quran.py`, `quran_master.db` |
+### Platform Features
+| Feature | Website | Extension Popup | Extension SidePanel | Extension Content Script |
+|---------|---------|----------------|---------------------|-------------------------|
+| **Authentication** | ✅ Guest + Google OAuth + linking | ❌ None | ❌ None | ❌ None |
+| **Cloud Documents** | ✅ Full CRUD (create/load/save/rename/delete) | ❌ None | ❌ None | ❌ None |
+| **Cloud Summaries** | ✅ Save/load/delete (Supabase) | ❌ None | ❌ None | ❌ None |
+| **Offline Sync** | ✅ LocalStorage queue + auto-flush | ❌ None | ❌ None | ❌ None |
+| **Settings Sync** | ✅ Theme synced to cloud | ��� None | ❌ None | ❌ None |
+| **Export** | ✅ TXT + DOCX + PDF | ✅ TXT only | ✅ TXT only | ❌ None |
+| **Import** | ✅ TXT + DOCX (mammoth.js) | ❌ None | ❌ None | ❌ None |
+| **Undo/Redo** | ✅ Custom 50-level stack | ❌ Browser default only | ❌ Browser default only | ❌ N/A |
+| **Word Count Goal** | ✅ Configurable progress indicator | ❌ None | ❌ None | ❌ N/A |
+| **Score Ring** | ✅ Animated SVG | ✅ Simplified SVG | ✅ Simplified SVG | ❌ None |
+| **Dismissed Words** | ✅ Persisted in localStorage | ❌ None | ❌ None | ❌ None |
+| **Theme Toggle** | ✅ Dark/Light + sync | ❌ Dark only | ❌ Dark only | N/A |
+| **Keyboard Shortcuts** | ✅ Extensive (Alt+1-3, Ctrl+S, Ctrl+Q) | ❌ None | ❌ None | Tab for autocomplete only |
+| **Rich Text Formatting** | ✅ Full toolbar (bold, italic, lists, links, etc.) | ❌ None | ❌ None | ❌ N/A |
+| **Suggestion Feedback** | ✅ Thumbs up/down | ❌ None | ❌ None | ❌ None |
+| **Draft Auto-save** | ✅ localStorage on every keystroke | ❌ Lost on close | ✅ chrome.storage.session | ❌ N/A |
+| **Write-back to Page** | N/A | ❌ None | ✅ Selection-aware splice | ✅ Via background relay |
+| **Quran Translation** | ✅ 13 languages | ❌ None | ✅ 13 languages | ❌ None |
+---
+## 3. Website vs Extension Comparison
+### Authentication
+| Aspect | Website | Extension | Gap |
+|--------|---------|-----------|-----|
+| Guest login | ✅ `signInAnonymously()` with 8s timeout | ❌ Zero auth code | **Critical** |
+| Google OAuth | ✅ PKCE flow via Supabase | ❌ | **Critical** |
+| Session restore | ✅ `getSession()` from localStorage | ❌ | **Critical** |
+| Identity linking | ✅ Guest → Google upgrade | ❌ | **High** |
+| Offline fallback | ✅ `enableOfflineAuthMode()` | ❌ | **High** |
+| Auth-gated features | ✅ Documents, sync, settings | ❌ All features work without auth | **Critical** |
+### AI Feature UX Comparison
+| Feature | Website UX | Extension UX | Parity |
+|---------|-----------|-------------|--------|
+| Analyze (S+G+P) | Rich editor with inline colored highlights, suggestion sidebar with cards, popover tooltips, apply/dismiss per-suggestion, apply-all, score ring, error donut | **Content Script:** Transparent overlay with colored marks + tooltip on hover. **Popup/SidePanel:** Textarea input + suggestion cards + score ring | ⚠️ Functional but significant UX gap |
+| Summarize | Editor tab with length slider, paragraph/bullets toggle, copy/export/save-to-cloud | Popup/SidePanel: textarea + radio buttons (short/medium/long) + copy + TXT download | ✅ Near parity |
+| Autocomplete | Ghost text inside editor + dropdown, word-boundary triggered, 400ms debounce, Tab to accept | **Content Script:** Ghost text for textarea/input only (NOT contenteditable). **Popup/SidePanel:** Button-click only | ⚠️ Missing core inline UX on most web editors |
+| Dialect | Dedicated tab, convert + copy + apply-to-editor | Popup/SidePanel: textarea + convert + copy | ✅ Near parity |
+| Quran | Dedicated tab, verify + 13-language translation + modal + apply-to-editor (protected spans) | Popup: basic verify. SidePanel: full verify + 13-language translation + apply-to-page | ✅ SidePanel has full parity |
+### Documents & Data
+| Aspect | Website | Extension | Gap |
+|--------|---------|-----------|-----|
+| Create document | ✅ `createDocument()` via Supabase | ❌ No Supabase integration | **Critical** |
+| List/search documents | ✅ Sidebar panel with search | ❌ | **Critical** |
+| Auto-save + sync | ✅ 2.5s debounced via SyncManager | ❌ | **Critical** |
+| Offline queue | ✅ LocalStorage persistence, auto-flush on reconnect | ❌ | **High** |
+| Export PDF/DOCX | ✅ docx.js + html2pdf | ❌ TXT download only | **Medium** |
+| Import TXT/DOCX | ✅ FileReader + mammoth.js | ❌ | **Low** |
+| Conflict resolution | ✅ Last-write-wins timestamp comparison | ❌ N/A | N/A |
+---
+## 4. Missing Features
+### Critical (Blocks Production Use)
+| # | Feature | Impact | Recommended Solution |
+|---|---------|--------|---------------------|
+| C1 | **No API rate limiting** | Any client can overwhelm the free-tier HF Space with unlimited requests to compute-intensive NLP endpoints | Add Flask-Limiter: 30 req/min/IP for `/api/analyze`, 10/min for `/api/summarize` |
+| C2 | **CORS wildcard `origins: "*"`** (`app.py:94`) | Any website can proxy through Bayan's API, enabling compute theft and abuse | Restrict to `["https://bayan10-bayan-api.hf.space", "chrome-extension://<ext-id>"]` |
+| C3 | **Extension has zero authentication** | Extension users cannot access cloud documents, settings, or history — breaks SaaS value proposition | Implement Supabase auth via `chrome.identity.launchWebAuthFlow()` for Google OAuth |
+### High (Important Feature Gap)
+| # | Feature | Impact | Recommended Solution |
+|---|---------|--------|---------------------|
+| H1 | **Missing Supabase migration files** for `documents`, `summaries`, `settings` tables | Only `001_profiles.sql` exists. RLS policies are documented but not version-controlled. Database cannot be recreated from migrations. | Create `002_documents.sql`, `003_summaries.sql`, `004_settings.sql` with RLS |
+| H2 | **Extension content script lacks autocomplete ghost text on contenteditable** | The flagship ghost-text feature only works on `<textarea>`/`<input>`, not on contenteditable elements (which most web editors use) | Port autocomplete logic to work with contenteditable in `content-inline.js` |
+| H3 | **No document versioning or history** | Each cloud save overwrites previous content. Hard delete with no recovery. No revision history. | Add `document_versions` table or soft-delete with `deleted_at` column |
+| H4 | **Backend monolith: `app.py` is 2,844 lines** | `analyze_text()` alone is 1,224 lines. Extremely difficult to maintain, test, or extend. | Split into `routes/`, `services/`, `middleware/` modules |
+| H5 | **Extension popup/sidepanel have no DOCX/PDF export** | Users can only download as TXT from extension | Add at minimum "Copy as formatted text"; ideally add DOCX export |
+### Medium (Improvement Needed)
+| # | Feature | Impact | Recommended Solution |
+|---|---------|--------|---------------------|
+| M1 | **Grammar model depends on external Gradio Space** | Hard dependency on `mohammedahmedezz2004/bayan_arabic_grammarly_correction`. If Space sleeps (HF free tier), first request has 10-30s cold start. If down, grammar breaks entirely. | Host grammar model directly on Bayan Space, or add rule-only fallback |
+| M2 | **No Content Security Policy** | Neither the website nor extension manifest declares a CSP. Website serves no CSP headers from Flask. | Add CSP headers in Flask and explicit CSP in extension manifest |
+| M3 | **Extension dismissed-words whitelist missing** | Users must dismiss the same false-positive words repeatedly across sessions | Persist dismissed words in `chrome.storage.local` |
+| M4 | **No i18n framework on website** | All strings hardcoded in Arabic HTML. Adding English support requires rewriting HTML. | Add simple i18n JSON loader (extension already has `_locales/ar/`) |
+| M5 | **Sync conflict resolution is lossy** | Last-write-wins silently discards the losing version with no user notification, no merge attempt. Clock skew between client `Date.now()` and server `updated_at` can cause wrong winner. | Show conflict notification to user, or implement operational transform |
+| M6 | **Only theme is synced in settings** | `settings_sync.js` only syncs `theme`. Other potential settings (font size, word goal, autocomplete toggle) are not synced. | Extend `preferences` JSONB column to include all user settings |
+### Low (Nice to Have)
+| # | Feature | Impact | Recommended Solution |
+|---|---------|--------|---------------------|
+| L1 | Extension only has Arabic locale | Cannot target non-Arabic Chrome Web Store users | Add `_locales/en/messages.json` |
+| L2 | No analytics or telemetry | No visibility into usage patterns, error rates, or feature adoption | Add lightweight privacy-respecting event tracking |
+| L3 | Vendor libraries loaded synchronously | `mammoth.browser.min.js` (340KB), `docx.umd.js` (1.2MB), `html2canvas.min.js` (210KB) block initial render even if never used | Lazy-load on first export action |
+| L4 | No service worker for website | No offline caching for static assets | Add basic SW for asset caching |
+| L5 | No onboarding flow | First-time users see empty editor with no guidance | Add sample text + guided tooltips |
+---
+## 5. Bugs Found
+### Active Bugs
+| # | Bug | Severity | Location | Details |
+|---|-----|----------|----------|---------|
+| B1 | **`/api/punctuation` has no `MAX_TEXT_LENGTH` check** | **High** | `app.py:596-647` | All other text endpoints enforce `MAX_TEXT_LENGTH = 5000`. Punctuation endpoint accepts unlimited input, allowing resource exhaustion via a single large request. |
+| B2 | **Race condition in `_isApplyingSuggestion` timing** | **High** | `editor.js` | Guard resets after 400ms but `analyzeText()` is called after 300ms. 100ms window where a suggestion application triggers recursive analysis, corrupting state. |
+| B3 | **Undo stack captures error overlay HTML** | **Medium** | `editor.js` | `pushUndoState()` saves `editor.innerHTML` including colored suggestion `<span>` elements. Undoing restores stale suggestion markup that doesn't correspond to current analysis. |
+| B4 | **`getEditorText()` clones entire DOM on every keystroke** | **Medium** | `selection.js` | `editor.cloneNode(true)` called on every `input` event via `updateEditorStats()`. For large documents, this is a significant performance hit. |
+| B5 | **Zero-width space from `formatFontSize` causes offset errors** | **Medium** | `format.js:126` | Inserts `` (zero-width space) when selection is collapsed. This invisible character is counted in text offsets, causing off-by-one errors in suggestion positions. |
+| B6 | **`restoreSelection` broken for non-collapsed selections** | **Medium** | `selection.js` | For range selections, the start Range is created but never added to the Selection object. `getRangeAt(0)` then operates on the browser's stale selection state. |
+| B7 | **Color picker reset removes ALL formatting** | **Medium** | `format.js:335` | Reset button calls `removeFormat` which strips ALL formatting (bold, italic, etc.), not just the color. |
+| B8 | **`overlaySuggestions` skips `.quran-applied` check on rebuilds** | **Medium** | `renderer.js:349-351` | Initial text node walk (line 253-256) skips `.quran-applied` nodes, but the per-suggestion rebuild at line 349 does NOT, causing protected Quran text to be modified. |
+| B9 | **`/api/quran` bypasses Content-Type check** | **Low** | `app.py` | Uses `request.get_json(force=True)` which accepts any Content-Type. All other endpoints properly check `request.is_json` first. |
+| B10 | **`/api/quran` inconsistent response format** | **Low** | `app.py` | Returns bare `jsonify(result)` without wrapping in `{'status': 'success', ...}` format used by all other endpoints. |
+| B11 | **`/api/autocomplete` `n` parameter unbounded** | **Low** | `app.py` | `n` is cast to int without bounds checking. `n=1000000` would attempt to generate a million suggestions. |
+| B12 | **`updateSummaryLength()` is a no-op** | **Low** | `index.html:~1920` | Empty function body — the summary length slider label never updates to reflect the selected value. |
+| B13 | **Extension overlay position breaks in scrollable containers** | **Medium** | `content-inline.js` | Overlay positioned with `getBoundingClientRect() + window.scrollY` (absolute). Breaks when text field is inside a scrollable `<div>` rather than the window. Tracks window scroll but not ancestor scroll. |
+| B14 | **Infinite retry loop in autocomplete init** | **Low** | `autocomplete.js:31` | `setTimeout(init, 500)` with no retry limit if `#editor-container` is not found. |
+| B15 | **Settings sync circular write** | **Low** | `settings-sync.js` | When cloud settings are loaded, `setTheme()` dispatches `bayan:themechange`, which triggers `onSettingsChanged()`, which saves the same theme back to cloud — wasteful round-trip. |
+| B16 | **Sync queue not cleared on logout** | **Low** | `auth.js:128-156` | `signOut()` does not call `SyncQueue.clear()`. Pending queue entries (containing document content) persist for the next user. |
+| B17 | **`_escapeSummaryAttr()` incomplete HTML escaping** | **Medium** | `summaries-ui.js` | Only escapes `"`, not `&`, `<`, `>`. Potential stored XSS vector if summary text contains HTML characters. |
+| B18 | **`summaries-ui.js` null crash risk** | **Low** | `summaries-ui.js:87` | `item.summary_text.length` will throw TypeError if `summary_text` is null/undefined. |
+### Previously Fixed Bugs
+| # | Bug | Status |
+|---|-----|--------|
+| B-F1 | Score sparkline renders with only 2 data points | ✅ Fixed |
+| B-F2 | `dismissAllFiltered()` only removed DOM without updating `window.currentSuggestions` | ✅ Fixed |
+---
+## 6. Security Issues
+| # | Issue | Severity | Location | Details |
+|---|-------|----------|----------|---------|
+| S1 | **CORS wildcard `origins: "*"`** | **Critical** | `app.py:94` | `CORS(app, resources={r"/api/*": {"origins": "*"}})` allows any origin to call all API endpoints. Enables compute theft, DDoS via free proxy, third-party scraping of NLP capabilities. |
+| S2 | **No API authentication on any endpoint** | **Critical** | `app.py`, all `/api/*` routes | No JWT, API key, session check, or rate limiting on any endpoint. Combined with wildcard CORS, any HTTP client can consume compute resources without limits. |
+| S3 | **Debug endpoint publicly accessible** | **High** | `app.py:243-277` | `/api/debug-models` requires no authentication. Exposes: model load status, startup error messages, system memory usage (`/proc/meminfo` contents), HF_API_TOKEN existence. |
+| S4 | **`trust_remote_code=True` for grammar model** | **High** | `model_loader.py:706` | Grammar model loaded with `trust_remote_code=True`, allowing arbitrary code execution from the HF model repository. All other models correctly use `False`. |
+| S5 | **Unsafe pickle deserialization** | **High** | `autocomplete_service.py:100` | `pickle.load(f)` on a file downloaded from HuggingFace Hub. Pickle can execute arbitrary code during deserialization. |
+| S6 | **Unsafe torch checkpoint loading** | **High** | `araspell_service.py:72` | `torch.load(model_path, weights_only=False)` disables PyTorch's safe loading, allowing arbitrary code execution via crafted checkpoint files. |
+| S7 | **Missing RLS migration files for core tables** | **High** | `supabase/migrations/` | Only `001_profiles.sql` exists. `documents`, `summaries`, `settings` tables have RLS documented but not version-controlled. Cannot verify RLS is enabled in production from codebase. |
+| S8 | **XSS risk in document content** | **Medium** | `documents-ui.js:196` | Document content stored as HTML and loaded into the editor. If `loadDocumentText()` uses `innerHTML` without sanitization, stored XSS is possible. `_escapeHtml()` helper exists but is only used for document list rendering, not content loading. |
+| S9 | **Document CRUD relies solely on RLS** | **Medium** | `documents-api.js:68-148` | `loadDocument()`, `saveDocument()`, `renameDocument()`, `deleteDocument()` filter only by document `id`, not by `user_id`. If RLS were misconfigured, any authenticated user could access any user's documents. |
+| S10 | **HTML injection risk in meta tag injection** | **Medium** | `app.py:189` | `f'<meta name="supabase-url" content="{SUPABASE_URL}">'` — if `SUPABASE_URL` contains `">`, it could break the HTML structure. No HTML escaping applied. |
+| S11 | **Telemetry data leaked to clients** | **Medium** | `app.py:~2745` | `_tel_events` list containing internal pipeline diagnostics (filter rejections, grammar diffs, Jaccard scores) is returned in the API response. Exposes internal processing details. |
+| S12 | **Extension Trusted Types passthrough** | **Low** | `content-inline.js:32-39` | `trustedTypes.createPolicy()` uses identity transform `(input) => input` — passes CSP enforcement but provides zero sanitization. All callers must ensure safety independently. |
+| S13 | **Auth tokens in localStorage (no CSP)** | **Low** | `auth/client.js:27` | Supabase tokens stored in localStorage, vulnerable to XSS. No Content Security Policy configured in Flask to mitigate XSS risks. Standard Supabase pattern, but defense-in-depth gap. |
+| S14 | **`DEBUG_TRACE = True` hardcoded** | **Low** | `app.py:90` | Verbose trace logging enabled unconditionally in production. May expose sensitive processing details in log aggregators. |
+---
+## 7. Performance Issues
+| # | Issue | Severity | Location | Details |
+|---|-------|----------|----------|---------|
+| P1 | **Grammar model is a remote API call** | **High** | `grammar_service.py:97-100` | Every grammar correction requires a round-trip to an external Gradio Space. If the Space sleeps (HF free tier), first request has 10-30s cold start. 3 retries with exponential backoff, but latency is fundamentally unpredictable (3-8s typical). |
+| P2 | **Duplicate morphological analysis in grammar rules** | **High** | `grammar_rules.py` | 7 separate grammar rule functions each call `self.mle.disambiguate(tokens)` independently: `fix_number_and_gender_agreement`, `fix_verbs_nasb_and_jazm`, `fix_subject_verb_agreement`, `fix_conditional_sentences`, `fix_demonstrative_agreement`, `fix_noun_adjective_agreement_advanced`, `fix_kana_and_inna`. For a 50-word sentence, this is 7 full morphological analysis passes that could be done once. |
+| P3 | **MLM scoring per word in spelling** | **High** | `araspell_rules.py`, ContextualCorrector | `score_with_mlm` runs a full AraBERT forward pass for each OOV word. `refine_sentence_with_mask` calls `score_with_mlm` twice + `predict_masked_token` per OOV word. For a 20-word sentence with 5 OOV words, this is ~15 BERT forward passes. |
+| P4 | **Tailwind CDN dev mode in production** | **Medium** | `index.html` | Full Tailwind CSS (~3MB uncompressed) downloaded via CDN development script on every page load. Should use a production build with purged CSS. |
+| P5 | **`analyze_text()` is a 1,224-line function** | **Medium** | `app.py:1534-2758` | Contains entire 3-stage pipeline with all guards, filters, and telemetry inline. Cold start loads all imports. `_is_small_spelling_change()` is 513 lines. |
+| P6 | **12+ `import re as _re_*` statements inside function body** | **Medium** | `app.py` | 12 separate `import re as _re_spell_guard`, `import re as _re_strip`, `import re as _re_emoji`, etc. inside `analyze_text()`. While Python caches modules, these are called on every request. Should be module-level. |
+| P7 | **`getEditorText()` clones entire DOM per keystroke** | **Medium** | `selection.js` | Called on every `input` event via `updateEditorStats()`. `editor.cloneNode(true)` for large documents is expensive. |
+| P8 | **Vendor JS loaded synchronously** | **Medium** | `index.html` | mammoth (340KB), docx.js (1.2MB), html2canvas (210KB) all block initial render even if never used. |
+| P9 | **`overlaySuggestions` is O(N×M)** | **Medium** | `renderer.js:349` | Rebuilds text node map after EVERY suggestion application, where N = suggestions, M = text nodes. |
+| P10 | **No API response caching on website** | **Medium** | `editor.js` | Every keystroke after 1s debounce triggers a full `/api/analyze` call. Extension background worker has LRU cache (20 entries, 5min TTL), but website doesn't cache at all. |
+| P11 | **Extension content script injected on ALL sites** | **Medium** | `manifest.json:43-55` | `matches: ["https://*/*", "http://*/*"]` — content script loads on every page, even non-Arabic sites. |
+| P12 | **Undo stack stores 50 full innerHTML snapshots** | **Low** | `editor.js` | For large documents with formatting, each snapshot can be 100KB+. 50 snapshots = 5MB+ of memory. |
+| P13 | **CSS not minified** | **Low** | `components.css` | Single file at 3,639+ lines (~90KB). No CSS modules, no scoping, no minification. |
+| P14 | **Draft auto-save serializes full editor HTML per keystroke** | **Low** | `editor.js` | `localStorage.setItem('bayan_editor_draft', editor.innerHTML)` on every input event. |
+---
+## 8. UX Problems
+| # | Issue | Severity | Details |
+|---|-------|----------|---------|
+| U1 | **Native `prompt()`/`confirm()` dialogs mixed with custom UI** | **Medium** | `insertLink()` uses `prompt()`, `clearEditor()` uses `confirm()`, `_createNewDocument()`/`_startRename()` use `prompt()`, `setWordGoalUI()` uses `prompt()`. These break visual consistency and cannot be styled. `_confirmDelete()` correctly uses custom `showConfirmDialog`. |
+| U2 | **Extension content script tooltip clips at viewport edge** | **Medium** | Tooltip for highlighted errors can overflow off-screen on narrow viewports. No boundary detection or repositioning logic. |
+| U3 | **No loading skeleton on initial editor page** | **Medium** | Editor page shows blank white space during model initialization (~60s cold start on HF Spaces). No skeleton/shimmer to indicate loading state. |
+| U4 | **Extension popup loses all state on close** | **Medium** | Popup has no state persistence. Clicking away destroys all analysis results. SidePanel correctly persists via `chrome.storage.session`. |
+| U5 | **Extension ghost-text autocomplete only works on textarea/input** | **Medium** | Most web editors (Gmail compose, WordPress, Medium, Discourse, Slack) use contenteditable. Ghost text autocomplete is disabled on all of these. |
+| U6 | **Inconsistent branding between popup and sidepanel** | **Low** | Popup uses `.bayan-*` class prefix, SidePanel uses `.sp-*`. Different color palettes and CSS variable naming (`--bayan-*` vs `--sp-*`). |
+| U7 | **Mobile bottom-sheet for suggestions lacks smooth gestures** | **Low** | Website has responsive breakpoints but the suggestion panel bottom-sheet on mobile has no drag-to-dismiss or smooth gesture handling. |
+| U8 | **Summary length slider label never updates** | **Low** | `updateSummaryLength()` is an empty function. Slider works but the label always shows "medium" regardless of position. |
+| U9 | **Missing accessibility features** | **Low** | No skip navigation link, no focus trap in Quran modal, no keyboard navigation for suggestion cards (only Enter key), no `aria-live` regions for dynamic score updates. |
+| U10 | **Protected sites disable contenteditable analysis entirely** | **Low** | Gmail, Google Docs, Notion, Sheets, Slides — contenteditable is disabled by protection list. Only `<textarea>`/`<input>` elements work on these sites. Expected but not communicated to users. |
+---
+## 9. Technical Debt
+### Backend
+| # | Item | Severity | Details |
+|---|------|----------|---------|
+| TD1 | **`analyze_text()`: 1,224-line function** | **High** | Contains entire 3-stage pipeline with all guards, filters, offset mapping, telemetry, and error handling. Should be decomposed into per-stage functions. |
+| TD2 | **`_is_small_spelling_change()`: 513-line function** | **High** | Single function with deeply nested conditionals implementing 20+ safety guards. |
+| TD3 | **Dead code: `SpellingModel`/`AutocompleteModel`/`GrammarModel`/`PunctuationModel` classes** | **Medium** | `model_loader.py:385-903`: Imported in `app.py:45-56` but NEVER instantiated. All models loaded through their respective service modules. The globals `spelling_model`, `autocomplete_model`, `grammar_model`, `punctuation_model` (lines 102-106) are always `None`. |
+| TD4 | **Dead code: `hf_inference.py`** | **Medium** | All functions are stubs that return input unchanged or empty lists. Imported in `app.py:65` but functions are never called in the pipeline. |
+| TD5 | **Two `RulesBasedCorrector` class definitions** | **Medium** | `araspell_rules.py`: First class at line ~38 with `KEYBOARD_NEIGHBORS`, second class at line ~540 with identical `KEYBOARD_NEIGHBORS`. Second class overwrites the first. |
+| TD6 | **Question mark cue words defined 5 times** | **Medium** | `_EXCL_CUES = {'هل', 'أين', ...}` defined at 5 separate locations in `punctuation_service.py` and `punctuation_rules.py`. |
+| TD7 | **12+ `import re` aliased inside function body** | **Medium** | `import re as _re_spell_guard`, `import re as _re_strip`, `import re as _re_emoji`, etc. — 12 aliased re imports inside `analyze_text()` instead of one module-level import. |
+| TD8 | **`Grammrar` typo in path** | **Low** | `model_loader.py:36`: `GRAMMAR_PATH = MODEL_BASE_PATH / "Grammrar" / "Model"` — misspelled directory name. Works only because the actual directory has the same typo. |
+| TD9 | **`ENABLE_*_MODEL` flags never checked** | **Low** | `app.py:59-63`: `ENABLE_DIALECT_MODEL`, `ENABLE_PUNCTUATION_MODEL`, etc. declared but never referenced. Features use lazy-loading regardless. |
+| TD10 | **12+ test files at project root** | **Low** | `test_camel.py`, `test_colon.py`, `test_grammar_fast.py`, `test_mapper.py`, `debug_pc002.py`, etc. scattered in root instead of `tests/`. |
+| TD11 | **`import json as _tel_json` and `import re as _re_struct` inside function** | **Low** | `app.py:2209, 2186`: Imports inside `analyze_text()` function body instead of module level. |
+### Frontend (Website)
+| # | Item | Severity | Details |
+|---|------|----------|---------|
+| TD12 | **`src/js/api.js` is dead code** | **Medium** | Uses ES6 `export` syntax but loaded via `<script>` tag (not `type="module"`). Exports are never imported. Website uses inline `fetch()` calls in `editor.js`. |
+| TD13 | **`applySuggestionAtOffsets` and `applyAlternativeCorrection` ~90% identical** | **Medium** | `editor.js`: Nearly identical DOM manipulation, filtering, and count-updating code. Should be a single function with a correction text parameter. |
+| TD14 | **`_sendFeedback()` defined but never called** | **Low** | `editor.js`: Feedback function exists but no UI element invokes it. |
+| TD15 | **`renderer.js` `createSegments()` first pass unused** | **Low** | Lines 42-93: Event timeline with `events`/`activeSuggestions` produces `segments` that are never used. Only `finalSegments` from the second pass (lines 96-131) is returned. |
+| TD16 | **33 script tags with implicit load-order dependency** | **Medium** | No module system, no dependency declaration. Mixed patterns: `api.js` uses ES6 `export`, `renderer.js`/`selection.js` use CommonJS guards, everything else is plain globals. |
+| TD17 | **~1,124 lines of inline JavaScript in `index.html`** | **Medium** | Page navigation, tab switching, Quran/dialect/summarization logic, Element SDK integration, DOMContentLoaded init — all inline instead of in separate files. |
+| TD18 | **CSS duplication and inconsistency** | **Low** | Multiple duplicate declarations in `components.css`: `.skeleton`, `input[type="range"]`, `.empty-state`, `.editor-stats`, `.footer-bar`, `.card-hover:hover`, `@keyframes fadeIn`. Legacy `--primary-color` coexists with canonical `--color-primary`. Undefined variables `--font-size-sm` and `--font-size-base` referenced. |
+| TD19 | **No build system** | **Low** | No bundler, no tree-shaking, no code-splitting. All JS loaded via `<script>` tags. No asset hashing for cache busting. |
+### Extension
+| # | Item | Severity | Details |
+|---|------|----------|---------|
+| TD20 | **60-70% code duplication between `popup.js` and `sidepanel.js`** | **High** | `updateCounts()`, `showToast()`, `setLoading()`, `downloadTxt()`, tab switching, `renderSuggestions()`, summarize/dialect/quran/autocomplete handlers — all nearly identical in both files. Any bug fix must be applied in both places. |
+| TD21 | **Dead code: `content.js`** | **Low** | 12-line stub file, not loaded by manifest. |
+| TD22 | **Dead code: `bayan-state.js`** | **Low** | 127-line WeakRef-based field tracking module, not loaded by manifest or any HTML file. Content script uses local variables instead. |
+| TD23 | **Dual API paths: background.js vs direct fetch** | **Medium** | Content script inline analysis goes through `background.js` (with caching, retry, timeout). Popup/SidePanel call `bayan-api.js` directly via `fetch()` (no caching, no retry, no timeout). Ghost-text autocomplete in content script also calls `fetch()` directly, bypassing background. |
+| TD24 | **No timeouts on popup/sidepanel API calls** | **Medium** | `bayan-api.js` functions accept an optional `AbortSignal` but no caller passes one. If the API hangs, the loading overlay blocks indefinitely. |
+| TD25 | **CSS variable duplication** | **Low** | Popup uses `--bayan-*` variables, sidepanel uses `--sp-*` variables, both defining the same color values. |
+---
+## 10. Recommended Roadmap
+### Phase 1: Security Hardening (Critical — Before Any Growth)
+**Timeline: 1-2 days** | **Priority: CRITICAL**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Restrict CORS** — Change `origins: "*"` to allowlist `["https://bayan10-bayan-api.hf.space", "chrome-extension://<ext-id>"]` | 30 min |
+| 2 | **Add rate limiting** — Flask-Limiter: 30 req/min/IP for `/api/analyze`, 10/min for others | 1 hour |
+| 3 | **Disable debug endpoint** — Guard `/api/debug-models` behind `app.debug` flag or remove | 15 min |
+| 4 | **Fix `trust_remote_code`** — Change to `False` at `model_loader.py:706` | 5 min |
+| 5 | **Add `MAX_TEXT_LENGTH` check to `/api/punctuation`** and `/api/analyze` | 15 min |
+| 6 | **Bound `/api/autocomplete` `n` parameter** — Cap at `n=10` | 5 min |
+| 7 | **Set `DEBUG_TRACE = False`** in production, or gate behind env var | 5 min |
+| 8 | **Stop leaking telemetry** — Remove `_tel_events` from API response (or gate behind debug flag) | 15 min |
+| 9 | **Escape HTML in meta tag injection** — Use `html.escape()` for Supabase URL/key injection | 10 min |
+| 10 | **Clear sync queue on logout** — Add `SyncQueue.clear()` to `signOut()` | 10 min |
+### Phase 2: Database & Migration Integrity (High)
+**Timeline: 1-2 days** | **Priority: HIGH**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Create `002_documents.sql`** with proper schema + RLS policies | 2 hours |
+| 2 | **Create `003_summaries.sql`** and `004_settings.sql` with RLS | 1 hour |
+| 3 | **Add `user_id` filter to single-document operations** — defense-in-depth alongside RLS | 30 min |
+| 4 | **Add soft-delete to documents** — `deleted_at` column instead of hard delete | 1 hour |
+### Phase 3: Extension Auth Unification (High)
+**Timeline: 3-5 days** | **Priority: HIGH**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Add Supabase client to extension** — UMD bundle in `shared/` | 1 day |
+| 2 | **Implement auth flow** — `chrome.identity.launchWebAuthFlow()` for Google OAuth | 1 day |
+| 3 | **Session persistence** — Store refresh token in `chrome.storage.local` | 4 hours |
+| 4 | **Enable cloud documents in extension** — Wire up existing SidePanel document UI | 1 day |
+| 5 | **Sync dismissed words** — Persist to `chrome.storage.local` and optionally to cloud | 2 hours |
+### Phase 4: Backend Refactoring (High)
+**Timeline: 5-7 days** | **Priority: HIGH**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Decompose `analyze_text()`** into `spelling_stage()`, `grammar_stage()`, `punctuation_stage()` | 2 days |
+| 2 | **Cache morphological analysis** — Run `mle.disambiguate()` once, pass result to all 7 grammar rules | 4 hours |
+| 3 | **Move 12+ `import re` to module level** — Single `import re` at top of file | 30 min |
+| 4 | **Delete dead code** — `hf_inference.py` stubs, unused `model_loader.py` classes, `ENABLE_*` flags | 1 hour |
+| 5 | **Split `app.py`** into `routes/`, `services/`, `middleware/` | 2 days |
+| 6 | **Move root-level test files** into `tests/` | 30 min |
+### Phase 5: Extension Code Quality (Medium)
+**Timeline: 3-4 days** | **Priority: MEDIUM**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Extract shared logic** from `popup.js` and `sidepanel.js` into `shared/bayan-core.js` | 1.5 days |
+| 2 | **Unify API path** — Route popup/sidepanel API calls through background.js for consistent caching/retry/timeout | 1 day |
+| 3 | **Delete dead files** — `content.js`, `bayan-state.js` | 15 min |
+| 4 | **Add AbortController timeouts** to `bayan-api.js` functions (60s default) | 2 hours |
+| 5 | **Add English locale** — `_locales/en/messages.json` | 2 hours |
+### Phase 6: Frontend Fixes & Polish (Medium)
+**Timeline: 3-4 days** | **Priority: MEDIUM**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Fix `_isApplyingSuggestion` race condition** — Increase guard timeout from 400ms to 600ms, or use a completion callback instead of timer | 30 min |
+| 2 | **Fix `restoreSelection` for range selections** — Add range to selection after creation | 30 min |
+| 3 | **Fix undo stack** — Strip suggestion overlay spans before saving innerHTML snapshot | 1 hour |
+| 4 | **Replace native `prompt()`/`confirm()` with custom dialogs** | 4 hours |
+| 5 | **Fix color picker reset** — Only remove color/highlight, not all formatting | 30 min |
+| 6 | **Switch Tailwind to production build** — Purge unused CSS, save ~3MB per page load | 2 hours |
+| 7 | **Lazy-load vendor libs** — mammoth, docx, html2canvas on first use | 2 hours |
+| 8 | **Delete dead `api.js`** and unused `createSegments()` first pass | 30 min |
+### Phase 7: Performance Optimization (Low)
+**Timeline: 2-3 days** | **Priority: LOW**
+| # | Task | Effort |
+|---|------|--------|
+| 1 | **Add website-side API caching** — localStorage TTL cache like extension background worker | 4 hours |
+| 2 | **Optimize `getEditorText()`** — Extract text without full DOM clone | 2 hours |
+| 3 | **Fix `overlaySuggestions` O(N×M)** — Build text node map once, update incrementally | 4 hours |
+| 4 | **Add CSS/JS minification** to Docker build | 2 hours |
+| 5 | **Add loading skeletons** for editor page cold start | 2 hours |
+| 6 | **Add `content_security_policy`** to extension manifest | 30 min |
+---
+## Summary Matrix
+| Category | Critical | High | Medium | Low | Total |
+|----------|---------|------|--------|-----|-------|
+| **Missing Features** | 3 (C1-C3) | 5 (H1-H5) | 6 (M1-M6) | 5 (L1-L5) | **19** |
+| **Bugs** | 0 | 2 (B1-B2) | 8 (B3-B8, B13, B17) | 6 (B9-B12, B14-B16, B18) | **18** |
+| **Security** | 2 (S1-S2) | 4 (S3-S6, S7) | 4 (S8-S11) | 3 (S12-S14) | **14** |
+| **Performance** | 0 | 3 (P1-P3) | 7 (P4-P10, P11) | 3 (P12-P14) | **14** |
+| **UX** | 0 | 0 | 5 (U1-U5) | 5 (U6-U10) | **10** |
+| **Tech Debt** | 0 | 3 (TD1-TD2, TD20) | 10 | 12 | **25** |
+| **TOTAL** | **5** | **17** | **40** | **34** | **100** |
+---
+## Final Verdict
+Bayan is a technically impressive Arabic NLP platform with a well-designed multi-stage correction pipeline (Spelling → Grammar → Punctuation), sophisticated offset mapping via PipelineContext/OffsetMapper/StageLocker, a mature contenteditable editor engine, and a Chrome extension that correctly follows Manifest V3 best practices.
+### What Bayan Does Well
+- **NLP Pipeline Architecture**: PipelineContext + PatchSet + StageLocker provide deterministic multi-stage coordination with overlap resolution and hierarchical locking. 20+ safety guards prevent hallucinations.
+- **Editor Engine**: Custom contenteditable with character-offset-based selection save/restore, reverse-order suggestion processing to avoid offset invalidation, and overlay-only rendering that never modifies user DOM.
+- **Extension Design**: Minimal permissions, proper HTML escaping throughout, thoughtful protected-site handling, LRU cache with collision-safe hashing, overlay-only rendering on 3rd-party sites.
+- **Auth Architecture**: Clean layered design (config → client → session → auth → UI) with PKCE flow, guest-to-Google upgrade path, `window.__bayanAuth` facade for decoupled downstream consumption, and graceful offline degradation.
+- **Sync System**: Offline-first with persistent localStorage queue, debounced flush, mutex-guarded sync, and automatic reconnection.
+- **Benchmark Coverage**: 320 tests across 8 datasets (spelling, grammar, punctuation, entities, religious, structured, hallucination, collision) at 94.37% pass rate.
+### What Must Be Fixed Before Growth
+1. **Security** (5 critical/high items): Wildcard CORS + zero rate limiting + zero API auth = anyone can abuse compute. Debug endpoint leaks internals. `trust_remote_code=True` and `weights_only=False` allow arbitrary code execution from model repos.
+2. **Extension Auth Gap**: Extension users cannot access cloud documents, settings, or history — breaks the SaaS value proposition entirely.
+3. **Database Integrity**: No migration files for 3 of 4 core tables. RLS policies documented but unverifiable from codebase.
+4. **Performance Bottleneck**: Grammar stage does 7 redundant morphological analysis passes. Spelling stage runs O(N) BERT forward passes for MLM scoring. Grammar depends on an external Gradio Space with unpredictable latency.
+5. **Code Architecture**: `analyze_text()` at 1,224 lines and `_is_small_spelling_change()` at 513 lines are unmaintainable. 60-70% popup/sidepanel duplication means every bug fix must be applied twice.
+### Bottom Line
+Bayan is **80% of the way to a production-grade SaaS product**. The NLP pipeline, editor engine, and extension architecture are solid foundations. The remaining 20% is:
+- **Week 1**: Security hardening (CORS, rate limiting, debug endpoint, model loading) + database migrations with RLS
+- **Week 2**: Extension authentication + cloud document access
+- **Week 3**: Backend decomposition + grammar performance optimization + extension code deduplication
+Total estimated effort: **3-4 focused weeks** to reach production readiness.

{reports → archive/benchmark_reports}/Phase10_Post_IVtoOOV_Audit.md RENAMED Viewed

File without changes

{reports → archive/benchmark_reports}/benchmark_audit.md RENAMED Viewed

File without changes

{reports → archive/benchmark_reports}/benchmark_samples.md RENAMED Viewed

File without changes

{reports → archive/benchmark_reports}/regression_benchmark_audit.md RENAMED Viewed

File without changes

debug_pc002.py → archive/dev_tests/debug_pc002.py RENAMED Viewed

File without changes

debug_pc023.py → archive/dev_tests/debug_pc023.py RENAMED Viewed

File without changes

debug_pipeline.py → archive/dev_tests/debug_pipeline.py RENAMED Viewed

File without changes

debug_punctuation.py → archive/dev_tests/debug_punctuation.py RENAMED Viewed

File without changes

extract_failures.py → archive/dev_tests/extract_failures.py RENAMED Viewed

File without changes

extract_grammar_fails.py → archive/dev_tests/extract_grammar_fails.py RENAMED Viewed

File without changes

extract_pc023.py → archive/dev_tests/extract_pc023.py RENAMED Viewed

File without changes

test_camel.py → archive/dev_tests/test_camel.py RENAMED Viewed

File without changes

test_colon.py → archive/dev_tests/test_colon.py RENAMED Viewed

File without changes

test_failures.py → archive/dev_tests/test_failures.py RENAMED Viewed

File without changes

test_grammar_fast.py → archive/dev_tests/test_grammar_fast.py RENAMED Viewed

File without changes

test_grammar_fixes.py → archive/dev_tests/test_grammar_fixes.py RENAMED Viewed

File without changes

test_grammar_logic.py → archive/dev_tests/test_grammar_logic.py RENAMED Viewed

File without changes

test_grammar_only.py → archive/dev_tests/test_grammar_only.py RENAMED Viewed

File without changes

test_grammar_rules.py → archive/dev_tests/test_grammar_rules.py RENAMED Viewed

File without changes

test_kana.py → archive/dev_tests/test_kana.py RENAMED Viewed

File without changes

test_local.py → archive/dev_tests/test_local.py RENAMED Viewed

File without changes

test_mapper.py → archive/dev_tests/test_mapper.py RENAMED Viewed

File without changes

test_mapper_isolated.py → archive/dev_tests/test_mapper_isolated.py RENAMED Viewed

File without changes

test_mlm.py → archive/dev_tests/test_mlm.py RENAMED Viewed

File without changes

test_models.py → archive/dev_tests/test_models.py RENAMED Viewed

File without changes

test_pc.py → archive/dev_tests/test_pc.py RENAMED Viewed

File without changes

test_pc001.py → archive/dev_tests/test_pc001.py RENAMED Viewed

File without changes

test_pc002.py → archive/dev_tests/test_pc002.py RENAMED Viewed

File without changes

test_pc002_api.py → archive/dev_tests/test_pc002_api.py RENAMED Viewed

File without changes

test_pc023.py → archive/dev_tests/test_pc023.py RENAMED Viewed

File without changes

test_pc027.py → archive/dev_tests/test_pc027.py RENAMED Viewed

File without changes

test_pc034.py → archive/dev_tests/test_pc034.py RENAMED Viewed

File without changes

test_pc044.py → archive/dev_tests/test_pc044.py RENAMED Viewed

File without changes

test_pos.py → archive/dev_tests/test_pos.py RENAMED Viewed

File without changes

test_punc.py → archive/dev_tests/test_punc.py RENAMED Viewed

File without changes

test_punc_rules.py → archive/dev_tests/test_punc_rules.py RENAMED Viewed

File without changes

test_punctuation.py → archive/dev_tests/test_punctuation.py RENAMED Viewed

File without changes

test_raw_punc.py → archive/dev_tests/test_raw_punc.py RENAMED Viewed

File without changes

test_sv.py → archive/dev_tests/test_sv.py RENAMED Viewed

File without changes

extension/IMPLEMENTATION_CHANGELOG.md → archive/phase_reports/extension_changelog.md RENAMED Viewed

File without changes