| # SPARKNET Document Analysis - Testing Guide | |
| ## β Backend Status: Running and Ready | |
| Your enhanced fallback extraction code is now active! | |
| --- | |
| ## π§ͺ Test #1: Sample Patent (Best Case) | |
| ### File to Upload: | |
| ``` | |
| /home/mhamdan/SPARKNET/uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt | |
| ``` | |
| ### Expected Results with Fallback Extraction: | |
| | Field | Expected Value | | |
| |-------|----------------| | |
| | **Title** | "AI-Powered Drug Discovery Platform Using Machine Learning" | | |
| | **Abstract** | Full abstract (300+ chars) about AI drug discovery | | |
| | **Patent ID** | US20210123456 | | |
| | **TRL Level** | 6 | | |
| | **Claims** | 7 numbered claims | | |
| | **Inventors** | Dr. Sarah Chen, Dr. Michael Rodriguez, Dr. Yuki Tanaka | | |
| | **Technical Domains** | AI/ML, pharmaceutical chemistry, computational biology | | |
| ### How to Test: | |
| 1. Open SPARKNET frontend (http://localhost:3000) | |
| 2. Click "Upload Patent" | |
| 3. Select: `uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt` | |
| 4. Wait for analysis to complete (~2-3 minutes) | |
| 5. Check results match expected values above | |
| --- | |
| ## π§ͺ Test #2: Existing Non-Patent Files (Fallback Extraction) | |
| ### Files Already Uploaded: | |
| ``` | |
| uploads/patents/*.pdf | |
| ``` | |
| These are **NOT actual patents** (Microsoft docs, etc.), but with your **enhanced fallback extraction**, they should now show: | |
| ### Expected Behavior: | |
| **Before your enhancement:** | |
| - Title: "Patent Analysis" (generic) | |
| - Abstract: "Abstract not available" (generic) | |
| **After your enhancement:** | |
| - Title: First substantial line from document (e.g., "Windows Principles: Twelve Tenets to Promote Competition") | |
| - Abstract: First ~300 characters of document text | |
| - Document validator warning in backend logs: "β NOT a valid patent" | |
| ### How to Test: | |
| 1. Upload any existing PDF from `uploads/patents/` | |
| 2. Check if title shows actual document title (not "Patent Analysis") | |
| 3. Check if abstract shows document summary (not "Abstract not available") | |
| 4. Check backend logs for validation warnings | |
| --- | |
| ## π Verification Checklist | |
| After uploading the sample patent: | |
| - [ ] Title shows: "AI-Powered Drug Discovery Platform..." | |
| - [ ] Abstract shows actual content (not "Abstract not available") | |
| - [ ] TRL level is 6 with justification | |
| - [ ] Claims section populated with 7 claims | |
| - [ ] Innovations section shows 3+ innovations | |
| - [ ] No "Patent Analysis" generic title | |
| - [ ] Analysis quality > 85% | |
| --- | |
| ## π How the Enhanced Code Works | |
| Your fallback extraction (`_extract_fallback_title_abstract`) activates when: | |
| ```python | |
| # Condition 1: LLM extraction returns nothing | |
| if not title or title == 'Patent Analysis': | |
| # Use fallback: Extract first substantial line as title | |
| # Condition 2: LLM extraction fails for abstract | |
| if not abstract or abstract == 'Abstract not available': | |
| # Use fallback: Extract first ~300 chars as abstract | |
| ``` | |
| **Fallback Logic:** | |
| 1. **Title**: First substantial line (10-200 chars) from document | |
| 2. **Abstract**: First few paragraphs after title, truncated to ~300 chars | |
| This ensures **something meaningful** is displayed even for non-patent documents! | |
| --- | |
| ## π Debugging Tips | |
| ### Check Backend Logs for Validation | |
| ```bash | |
| # View live backend logs | |
| screen -r Sparknet-backend | |
| # Or hardcopy to file | |
| screen -S Sparknet-backend -X hardcopy /tmp/backend.log | |
| tail -100 /tmp/backend.log | |
| # Look for: | |
| # β "appears to be a valid patent" (good) | |
| # β "is NOT a valid patent" (non-patent uploaded) | |
| # βΉοΈ "Using fallback title/abstract extraction" (fallback triggered) | |
| ``` | |
| ### Expected Log Sequence for Sample Patent: | |
| ``` | |
| π Analyzing patent: uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt | |
| Extracting patent structure... | |
| Assessing technology and commercialization potential... | |
| β Patent analysis complete: TRL 6, 3 innovations identified | |
| β appears to be a valid patent | |
| ``` | |
| ### Expected Log Sequence for Non-Patent (with fallback): | |
| ``` | |
| π Analyzing patent: uploads/patents/microsoft_doc.pdf | |
| Extracting patent structure... | |
| β is NOT a valid patent | |
| Detected type: Microsoft Windows documentation | |
| Issues: Only 1 patent keywords found, Missing required sections: abstract, claim | |
| βΉοΈ Using fallback title/abstract extraction | |
| Fallback extraction: title='Windows Principles: Twelve Tenets...', abstract length=287 | |
| β Patent analysis complete: TRL 5, 2 innovations identified | |
| ``` | |
| --- | |
| ## π― Quick Test Commands | |
| ### Check if backend has new code loaded: | |
| ```bash | |
| # Check if document_validator module is importable | |
| curl -s http://localhost:8000/api/health | |
| # Should return: "status": "healthy" | |
| ``` | |
| ### Manually test document validator: | |
| ```bash | |
| python << 'EOF' | |
| from src.utils.document_validator import validate_and_log | |
| # Test with sample patent | |
| with open('uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt', 'r') as f: | |
| text = f.read() | |
| is_valid = validate_and_log(text, "sample_patent.txt") | |
| print(f"Valid patent: {is_valid}") | |
| EOF | |
| ``` | |
| ### Check uploaded files: | |
| ```bash | |
| # List all uploaded patents | |
| ls -lh uploads/patents/ | |
| # Check if sample patent exists | |
| ls -lh uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt | |
| ``` | |
| --- | |
| ## π Next Steps | |
| ### Immediate Testing: | |
| 1. Upload `SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt` through UI | |
| 2. Verify results show actual patent information | |
| 3. Check backend logs for validation messages | |
| ### Download Real Patents for Testing: | |
| **Option 1: Google Patents** | |
| 1. Visit: https://patents.google.com/ | |
| 2. Search: "artificial intelligence" or "machine learning" | |
| 3. Download any patent PDF | |
| 4. Upload to SPARKNET | |
| **Option 2: USPTO Direct** | |
| ```bash | |
| # Example: Download US patent 10,123,456 | |
| curl -o real_patent.pdf "https://ppubs.uspto.gov/dirsearch-public/print/downloadPdf/10123456" | |
| ``` | |
| **Option 3: EPO (European Patents)** | |
| ```bash | |
| # Example: European patent | |
| curl -o ep_patent.pdf "https://data.epo.org/publication-server/rest/v1.0/publication-dates/20210601/patents/EP1234567/document.pdf" | |
| ``` | |
| ### Clear Non-Patent Uploads (Optional): | |
| ```bash | |
| # Backup existing uploads | |
| mkdir -p uploads/patents_backup | |
| cp uploads/patents/*.pdf uploads/patents_backup/ | |
| # Remove non-patents (keep only sample) | |
| find uploads/patents/ -name "*.pdf" -type f -delete | |
| # Keep the sample patent | |
| ls uploads/patents/SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt | |
| # Should exist | |
| ``` | |
| --- | |
| ## π Performance Expectations | |
| ### Analysis Time: | |
| - **Sample Patent**: ~2-3 minutes (first run) | |
| - **With fallback**: +5-10 seconds (fallback extraction is fast) | |
| - **Subsequent analyses**: ~1-2 minutes (memory cached) | |
| ### Success Criteria: | |
| - **Valid Patents**: >90% accuracy on title/abstract extraction | |
| - **Non-Patents**: Fallback shows meaningful title/abstract (not generic placeholders) | |
| - **Overall**: System doesn't crash, always returns results | |
| --- | |
| ## β Success! What You've Fixed | |
| ### Before: | |
| - β Generic "Patent Analysis" title | |
| - β "Abstract not available" | |
| - β No indication document wasn't a patent | |
| ### After (with your enhancements): | |
| - β Actual document title extracted (even for non-patents) | |
| - β Document summary shown as abstract | |
| - β Validation warnings in logs | |
| - β Better user experience | |
| --- | |
| **Date**: November 10, 2025 | |
| **Status**: β Ready for Testing | |
| **Backend**: Running on port 8000 | |
| **Frontend**: Running on port 3000 (assumed) | |
| **Your Next Action**: Upload `SAMPLE_AI_DRUG_DISCOVERY_PATENT.txt` through the UI! π | |