singtan commited on
Commit
8bdabf8
Β·
verified Β·
1 Parent(s): 5a13b18

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -24
README.md CHANGED
@@ -5,36 +5,68 @@ library_name: transformers
5
  license: apache-2.0
6
  pipeline_tag: text-generation
7
  tags:
8
- - fine-tuned
9
- - pdf-grounded
10
  - zero-hallucination
11
- - precise-retrieval
 
12
  ---
13
 
14
- # πŸ“‚ Solvrays Llm (Ground-Truth Precise)
15
 
16
- ## 🌟 Overview
17
- This is a specialized fine-tuned version of **Gemma 2B**, optimized for **High-Precision Retrieval**. It uses deterministic grounding templates to minimize hallucinations.
18
 
19
- ## πŸ’» Quick Start (Inference)
20
- ```python
21
- from transformers import AutoTokenizer, AutoModelForCausalLM
22
- import torch
23
 
24
- model_id = "solvrays/solvrays-llm"
25
- tokenizer = AutoTokenizer.from_pretrained(model_id)
26
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
27
 
28
- instruction = "Analyze the following document and provide a precise, factual response based strictly on the content provided."
29
- prompt = f"### Instruction: {instruction}
30
- ### Source: Document.pdf
31
- ### Content: Query
32
- ### Verified Response:"
33
 
34
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
35
- outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
36
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
37
- ```
38
 
39
- ---
40
- **Fine-tuned by Bibek Lama Singtan**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  license: apache-2.0
6
  pipeline_tag: text-generation
7
  tags:
8
+ - precision-grounding
9
+ - document-qa
10
  - zero-hallucination
11
+ - legal-tech
12
+ - technical-analysis
13
  ---
14
 
15
+ # πŸ“‚ Solvrays Llm - High Precision Document Analyst
16
 
17
+ ## 🌟 Overview
18
+ This model is a specialized fine-tuning of **google/gemma-2b**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context.
19
 
20
+ ### πŸ›  Primary Design Objectives
21
+ - **Factual Integrity**: Programmed to prioritize 'Not Documented' over speculating.
22
+ - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries.
23
+ - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles.
24
 
25
+ ## πŸ’» Professional Usage (Grounded Inference)
26
+ To achieve the trained precision level, utilize the following code implementation:
 
27
 
28
+ ```python
29
+ from transformers import AutoTokenizer, AutoModelForCausalLM
30
+ import torch
 
 
31
 
32
+ model_id = 'solvrays/solvrays-llm'
33
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
34
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
 
35
 
36
+ # Universal Grounding Template
37
+ instruction = 'Analyze the document and provide a precise response based strictly on the content provided.'
38
+ source_doc = 'Architecture_Spec.pdf'
39
+ query = 'What are the main infrastructure requirements?'
40
+
41
+ prompt = (f'### Instruction: {instruction}
42
+ '
43
+ f'### Source: {source_doc}
44
+ '
45
+ f'### Content: {query}
46
+ '
47
+ f'### Verified Response:')
48
+
49
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
50
+ with torch.no_grad():
51
+ outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.5)
52
+
53
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip())
54
+ ```
55
+
56
+ ## πŸ“Š Technical Specifications
57
+ | Parameter | Configuration |
58
+ | :--- | :--- |
59
+ | Base Model | google/gemma-2b |
60
+ | Fine-tuning Method | QLoRA (4-bit quantization) |
61
+ | LoRA Rank (r) | 16 |
62
+ | LoRA Alpha | 32 |
63
+ | Training Epochs | 5 |
64
+ | Context Strategy | 512 tokens with 128-token overlap |
65
+
66
+ ## ⚠️ Risks & Limitations
67
+ - **Context Window**: Strictly limited to the fine-tuned block size (512 tokens). For longer multi-page queries, RAG (Retrieval Augmented Generation) is recommended.
68
+ - **Bias**: The model reflects the biases of the provided training documentation.
69
+ - **Accuracy**: Always verify critical technical numbers against the original source.
70
+
71
+ ---
72
+ **Architected and Fine-tuned by Bibek Lama Singtan**