solvrays
/

solvrays-llm

@@ -1,5 +1,5 @@
 ---
-base_model: google/gemma-2b
 language: en
 library_name: transformers
 license: apache-2.0
@@ -13,60 +13,48 @@ tags:
 ---
 # 📂 Solvrays Llm - High Precision Document Analyst
-## 🌟 Overview
-This model is a specialized fine-tuning of **google/gemma-2b**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context.
-### 🛠 Primary Design Objectives
 - **Factual Integrity**: Programmed to prioritize 'Not Documented' over speculating.
 - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries.
 - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles.
-## 💻 Professional Usage (Grounded Inference)
 To achieve the trained precision level, utilize the following code implementation:
-```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 model_id = 'solvrays/solvrays-llm'
 tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
 # Universal Grounding Template
-instruction = 'Analyze the following document and provide a precise, factual response based strictly on the content provided. If the information is not present, you must state that it is not documented.'
 query = 'What are the main infrastructure requirements?'
 prompt = (f'### Instruction: {instruction}\n'
           f'### Knowledge Context: {query}\n'
           f'### Verified Response:')
-prompt = (f'### Instruction: {instruction}
-'
-### Knowledge Context: Extract the overview and key details from this document.
-          f'### Verified Response:')
 inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
 with torch.no_grad():
     outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip())
 ```
-## 📊 Technical Specifications
 | Parameter | Configuration |
 | :--- | :--- |
-| Base Model | google/gemma-2b |
 | Fine-tuning Method | QLoRA (4-bit quantization) |
 | LoRA Rank (r) | 16 |
 | LoRA Alpha | 32 |
 | Training Epochs | 5 |
 | Context Strategy | 512 tokens with 128-token overlap |
-## ⚠️ Risks & Limitations
 - **Context Window**: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended.
 - **Bias**: The model reflects the biases of the provided training documentation.
 - **Accuracy**: Always verify critical technical numbers against the original source.
----
 **Architected and Fine-tuned by Bibek Lama Singtan**

 ---
+base_model: google/gemma-2b-it
 language: en
 library_name: transformers
 license: apache-2.0
 ---
 # 📂 Solvrays Llm - High Precision Document Analyst
+\n## 🌟 Overview
+This model is a specialized fine-tuning of **google/gemma-2b-it**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context.
+\n### 🛠 Primary Design Objectives
 - **Factual Integrity**: Programmed to prioritize 'Not Documented' over speculating.
 - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries.
 - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles.
+\n## 💻 Professional Usage (Grounded Inference)
 To achieve the trained precision level, utilize the following code implementation:
+\n```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 model_id = 'solvrays/solvrays-llm'
 tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16)
 # Universal Grounding Template
+instruction = 'Analyze your internal knowledge base and provide a precise, factual response based strictly on the documentation you have been trained on. If the information is not documented, state that it is not documented.'
 query = 'What are the main infrastructure requirements?'
 prompt = (f'### Instruction: {instruction}\n'
           f'### Knowledge Context: {query}\n'
           f'### Verified Response:')
 inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
 with torch.no_grad():
     outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip())
 ```
+\n## 📊 Technical Specifications
 | Parameter | Configuration |
 | :--- | :--- |
+| Base Model | google/gemma-2b-it |
 | Fine-tuning Method | QLoRA (4-bit quantization) |
 | LoRA Rank (r) | 16 |
 | LoRA Alpha | 32 |
 | Training Epochs | 5 |
 | Context Strategy | 512 tokens with 128-token overlap |
+\n## ⚠️ Risks & Limitations
 - **Context Window**: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended.
 - **Bias**: The model reflects the biases of the provided training documentation.
 - **Accuracy**: Always verify critical technical numbers against the original source.
+\n---
 **Architected and Fine-tuned by Bibek Lama Singtan**