aditya-6122 commited on
Commit
e44926a
·
verified ·
1 Parent(s): 827abb3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - language-model
6
+ - pytorch
7
+ - rnn
8
+ - text-generation
9
+ datasets:
10
+ - aditya-6122/tinystories-custom-dataset-17783-v1-test
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # vanilla-rnn-gru-like
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+ This is a custom language model trained on a dataset of short stories, designed for text generation tasks.
20
+
21
+ ### Model Sources
22
+ - **Repository**: [GitHub Repository](https://github.com/your-repo) # Replace with actual repo if available
23
+ - **Paper**: N/A
24
+
25
+ ## Uses
26
+
27
+ ### Direct Use
28
+ This model can be used for generating short stories and text completion tasks.
29
+
30
+ ### Downstream Use
31
+ Fine-tune the model on specific domains for specialized text generation.
32
+
33
+ ### Out-of-Scope Use
34
+ Not intended for production use without further validation.
35
+
36
+ ## Training Details
37
+
38
+ ### Training Data
39
+ The model was trained on the [aditya-6122/tinystories-custom-dataset-17783-v1-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-17783-v1-test) dataset.
40
+
41
+ ### Training Procedure
42
+ - **Training Regime**: Standard language model training with cross-entropy loss
43
+ - **Epochs**: 1
44
+ - **Batch Size**: 2
45
+ - **Learning Rate**: 0.001
46
+ - **Optimizer**: Adam (assumed)
47
+ - **Hardware**: Apple Silicon MPS (if available) or CPU
48
+
49
+ ### Tokenizer
50
+ The model uses the [aditya-6122/tinystories-tokenizer-vb-17783-char_bpe-v1-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-17783-char_bpe-v1-test) tokenizer.
51
+
52
+ ### Model Architecture
53
+ - **Architecture Type**: RNN-based language model with GRU cells
54
+ - **Embedding Dimension**: 512
55
+ - **Hidden Dimension**: 1024
56
+ - **Vocabulary Size**: 17783
57
+ - **Architecture Diagram**: See `model_arch.jpg` for visual representation
58
+
59
+ ## Files
60
+ - `model.bin`: The trained model weights in PyTorch format.
61
+ - `tokenizer.json`: The tokenizer configuration.
62
+ - `model_arch.jpg`: Architecture diagram showing the GRU model structure.
63
+
64
+ ## How to Use
65
+
66
+ Since this is a custom model, you'll need to load it using the provided code:
67
+
68
+ ```python
69
+ import torch
70
+ from your_language_model import LanguageModel # Replace with actual import
71
+ from tokenizers import Tokenizer
72
+
73
+ # Load tokenizer
74
+ tokenizer = Tokenizer.from_file("tokenizer.json")
75
+
76
+ # Load model
77
+ vocab_size = tokenizer.get_vocab_size()
78
+ model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
79
+ model.load_state_dict(torch.load("model.bin"))
80
+ model.eval()
81
+
82
+ # Generate text
83
+ input_text = "Once upon a time"
84
+ # Tokenize and generate (implement your generation logic)
85
+ ```
86
+
87
+ ## Limitations
88
+ - This is a basic RNN model and may not perform as well as transformer-based models.
89
+ - Trained on limited data, may exhibit biases from the training dataset.
90
+ - Not optimized for production deployment.
91
+
92
+ ## Ethical Considerations
93
+ Users should be aware of potential biases in generated text and use the model responsibly.
94
+
95
+ ## Citation
96
+ If you use this model, please cite:
97
+ ```
98
+ @misc{vanilla-rnn-gru-like},
99
+ title={vanilla-rnn-gru-like},
100
+ author={Your Name},
101
+ year={2024},
102
+ publisher={Hugging Face},
103
+ url={https://huggingface.co/aditya-6122/vanilla-rnn-gru-like}
104
+ }
105
+ ```