--- base_model: google/gemma-2b-it language: en library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags: - precision-grounding - document-qa - zero-hallucination - legal-tech - technical-analysis --- # 📂 Solvrays Llm - High Precision Document Analyst \n## 🌟 Overview This model is a specialized fine-tuning of **google/gemma-2b-it**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context. \n### 🛠 Primary Design Objectives - **Factual Integrity**: Programmed to prioritize 'Not Documented' over speculating. - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries. - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles. \n## 💻 Professional Usage (Grounded Inference) To achieve the trained precision level, utilize the following code implementation: \n```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = 'solvrays/solvrays-llm' tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16) # Universal Grounding Template instruction = 'Analyze your internal knowledge base and provide a precise, factual response based strictly on the documentation you have been trained on. If the information is not documented, state that it is not documented.' query = 'What are the main infrastructure requirements?' prompt = (f'### Instruction: {instruction}\n' f'### Knowledge Context: {query}\n' f'### Verified Response:') inputs = tokenizer(prompt, return_tensors='pt').to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5) print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()) ``` \n## 📊 Technical Specifications | Parameter | Configuration | | :--- | :--- | | Base Model | google/gemma-2b-it | | Fine-tuning Method | QLoRA (4-bit quantization) | | LoRA Rank (r) | 16 | | LoRA Alpha | 32 | | Training Epochs | 5 | | Context Strategy | 512 tokens with 128-token overlap | \n## ⚠️ Risks & Limitations - **Context Window**: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended. - **Bias**: The model reflects the biases of the provided training documentation. - **Accuracy**: Always verify critical technical numbers against the original source. \n--- **Architected and Fine-tuned by Bibek Lama Singtan**