| --- |
| base_model: google/gemma-2b-it |
| language: en |
| library_name: transformers |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| tags: |
| - precision-grounding |
| - document-qa |
| - zero-hallucination |
| - legal-tech |
| - technical-analysis |
| --- |
| |
| # π Solvrays Llm - High Precision Document Analyst |
| \n## π Overview |
| This model is a specialized fine-tuning of **google/gemma-2b-it**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context. |
| \n### π Primary Design Objectives |
| - **Factual Integrity**: Programmed to prioritize 'Not Documented' over speculating. |
| - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries. |
| - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles. |
| \n## π» Professional Usage (Grounded Inference) |
| To achieve the trained precision level, utilize the following code implementation: |
| \n```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| model_id = 'solvrays/solvrays-llm' |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16) |
| |
| # Universal Grounding Template |
| instruction = 'Analyze your internal knowledge base and provide a precise, factual response based strictly on the documentation you have been trained on. If the information is not documented, state that it is not documented.' |
| query = 'What are the main infrastructure requirements?' |
| |
| prompt = (f'### Instruction: {instruction}\n' |
| f'### Knowledge Context: {query}\n' |
| f'### Verified Response:') |
| |
| inputs = tokenizer(prompt, return_tensors='pt').to(model.device) |
| with torch.no_grad(): |
| outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()) |
| ``` |
| \n## π Technical Specifications |
| | Parameter | Configuration | |
| | :--- | :--- | |
| | Base Model | google/gemma-2b-it | |
| | Fine-tuning Method | QLoRA (4-bit quantization) | |
| | LoRA Rank (r) | 16 | |
| | LoRA Alpha | 32 | |
| | Training Epochs | 5 | |
| | Context Strategy | 512 tokens with 128-token overlap | |
| \n## β οΈ Risks & Limitations |
| - **Context Window**: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended. |
| - **Bias**: The model reflects the biases of the provided training documentation. |
| - **Accuracy**: Always verify critical technical numbers against the original source. |
| \n--- |
| **Architected and Fine-tuned by Bibek Lama Singtan** |