teszenofficial commited on
Commit
72003ee
·
verified ·
1 Parent(s): f078257

Add README

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ license: apache-2.0
4
+ tags:
5
+ - text-generation
6
+ - transformer
7
+ - pytorch
8
+ ---
9
+
10
+ # MTP Mini - Modelo Mejorado 20x
11
+
12
+ Modelo transformer con arquitectura avanzada entrenado en GPU T4.
13
+
14
+ ## Arquitectura
15
+ - **Parámetros**: ~310.7M (310,708,225)
16
+ - **Vocabulario**: 8000 tokens
17
+ - **Capas**: 24
18
+ - **Dimensión**: 1024
19
+ - **Contexto**: 2048 tokens
20
+
21
+ ## Mejoras
22
+ - ✅ RoPE, RMSNorm, SwiGLU
23
+ - ✅ Flash Attention
24
+ - ✅ Gradient Checkpointing
25
+ - ✅ Mixed Precision FP16
26
+ - ✅ Anti-alucinación
27
+ - ✅ Confidence Scoring
28
+
29
+ ## Uso
30
+ ```python
31
+ import torch, pickle
32
+ from tokenizer import MTPTokenizer
33
+ from model import MTPMiniModel
34
+
35
+ with open('mtp_mini.pkl', 'rb') as f:
36
+ data = pickle.load(f)
37
+
38
+ tokenizer = MTPTokenizer('mtp_tokenizer.model')
39
+ model = MTPMiniModel(**data['config']['model'])
40
+ model.load_state_dict(data['model_state_dict'])
41
+ model.eval()
42
+
43
+ prompt = "¿Qué es la IA?"
44
+ ids = torch.tensor([tokenizer.encode(prompt)]).unsqueeze(0)
45
+ output = model.generate(ids, max_new_tokens=150)
46
+ print(tokenizer.decode(output[0].tolist()))
47
+ ```
48
+
49
+ Entrenado en Google Colab con GPU T4.