solvrays
/

solvrays-llm

Text Generation

precision-grounding

zero-hallucination

technical-analysis

text-generation-inference

Model card Files Files and versions

solvrays-llm / README.md

singtan's picture

Upload README.md with huggingface_hub

8aacd19 verified 4 days ago

|

history blame contribute delete

2.77 kB

	---
	base_model: google/gemma-2b-it
	language: en
	library_name: transformers
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- precision-grounding
	- document-qa
	- zero-hallucination
	- legal-tech
	- technical-analysis
	---

	# 📂 Solvrays Llm - High Precision Document Analyst
	\n## 🌟 Overview
	This model is a specialized fine-tuning of google/gemma-2b-it, engineered for Zero-Hallucination Document Retrieval. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context.
	\n### 🛠 Primary Design Objectives
	- Factual Integrity: Programmed to prioritize 'Not Documented' over speculating.
	- Contextual Continuity: Overlap-aware training prevents information loss across page boundaries.
	- Domain Versatility: Seamlessly switches between technical and non-technical document styles.
	\n## 💻 Professional Usage (Grounded Inference)
	To achieve the trained precision level, utilize the following code implementation:
	\n```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = 'solvrays/solvrays-llm'
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16)

	# Universal Grounding Template
	instruction = 'Analyze your internal knowledge base and provide a precise, factual response based strictly on the documentation you have been trained on. If the information is not documented, state that it is not documented.'
	query = 'What are the main infrastructure requirements?'

	prompt = (f'### Instruction: {instruction}\n'
	f'### Knowledge Context: {query}\n'
	f'### Verified Response:')

	inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip())
	```
	\n## 📊 Technical Specifications
	\| Parameter \| Configuration \|
	\| :--- \| :--- \|
	\| Base Model \| google/gemma-2b-it \|
	\| Fine-tuning Method \| QLoRA (4-bit quantization) \|
	\| LoRA Rank (r) \| 16 \|
	\| LoRA Alpha \| 32 \|
	\| Training Epochs \| 5 \|
	\| Context Strategy \| 512 tokens with 128-token overlap \|
	\n## ⚠️ Risks & Limitations
	- Context Window: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended.
	- Bias: The model reflects the biases of the provided training documentation.
	- Accuracy: Always verify critical technical numbers against the original source.
	\n---
	Architected and Fine-tuned by Bibek Lama Singtan