aditya-6122 commited on
Commit
4e1f26e
·
verified ·
1 Parent(s): 169f278

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - rnn
8
  - text-generation
9
  datasets:
10
- - aditya-6122/tinystories-custom-dataset-17783-v1-test
11
  pipeline_tag: text-generation
12
  ---
13
 
@@ -36,24 +36,24 @@ Not intended for production use without further validation.
36
  ## Training Details
37
 
38
  ### Training Data
39
- The model was trained on the [aditya-6122/tinystories-custom-dataset-17783-v1-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-17783-v1-test) dataset.
40
 
41
  ### Training Procedure
42
  - **Training Regime**: Standard language model training with cross-entropy loss
43
- - **Epochs**: 1
44
- - **Batch Size**: 2
45
  - **Learning Rate**: 0.001
46
  - **Optimizer**: Adam (assumed)
47
  - **Hardware**: Apple Silicon MPS (if available) or CPU
48
 
49
  ### Tokenizer
50
- The model uses the [aditya-6122/tinystories-tokenizer-vb-17783-char_bpe-v1-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-17783-char_bpe-v1-test) tokenizer.
51
 
52
  ### Model Architecture
53
  - **Architecture Type**: RNN-based language model with GRU cells
54
  - **Embedding Dimension**: 512
55
  - **Hidden Dimension**: 1024
56
- - **Vocabulary Size**: 17783
57
  - **Architecture Diagram**: See `model_arch.jpg` for visual representation
58
 
59
  ## Files
 
7
  - rnn
8
  - text-generation
9
  datasets:
10
+ - aditya-6122/tinystories-custom-dataset-18542-v2-test
11
  pipeline_tag: text-generation
12
  ---
13
 
 
36
  ## Training Details
37
 
38
  ### Training Data
39
+ The model was trained on the [aditya-6122/tinystories-custom-dataset-18542-v2-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-18542-v2-test) dataset.
40
 
41
  ### Training Procedure
42
  - **Training Regime**: Standard language model training with cross-entropy loss
43
+ - **Epochs**: 5
44
+ - **Batch Size**: 128
45
  - **Learning Rate**: 0.001
46
  - **Optimizer**: Adam (assumed)
47
  - **Hardware**: Apple Silicon MPS (if available) or CPU
48
 
49
  ### Tokenizer
50
+ The model uses the [aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test) tokenizer.
51
 
52
  ### Model Architecture
53
  - **Architecture Type**: RNN-based language model with GRU cells
54
  - **Embedding Dimension**: 512
55
  - **Hidden Dimension**: 1024
56
+ - **Vocabulary Size**: 18542
57
  - **Architecture Diagram**: See `model_arch.jpg` for visual representation
58
 
59
  ## Files