| --- |
| language: en |
| license: mit |
| tags: |
| - language-model |
| - pytorch |
| - rnn |
| - text-generation |
| datasets: |
| - aditya-6122/tinystories-custom-dataset-18542-v2-test |
| pipeline_tag: text-generation |
| --- |
| |
| # vanilla-rnn-gru-like |
|
|
| ## Model Details |
|
|
| ### Model Description |
| This is a custom language model trained on a dataset of short stories, designed for text generation tasks. |
|
|
| ### Model Sources |
| - **Repository**: [GitHub Repository](https://github.com/your-repo) # Replace with actual repo if available |
| - **Paper**: N/A |
|
|
| ## Uses |
|
|
| ### Direct Use |
| This model can be used for generating short stories and text completion tasks. |
|
|
| ### Downstream Use |
| Fine-tune the model on specific domains for specialized text generation. |
|
|
| ### Out-of-Scope Use |
| Not intended for production use without further validation. |
|
|
| ## Training Details |
|
|
| ### Training Data |
| The model was trained on the [aditya-6122/tinystories-custom-dataset-18542-v2-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-18542-v2-test) dataset. |
|
|
| ### Training Procedure |
| - **Training Regime**: Standard language model training with cross-entropy loss |
| - **Epochs**: 5 |
| - **Batch Size**: 128 |
| - **Learning Rate**: 0.001 |
| - **Optimizer**: Adam (assumed) |
| - **Hardware**: Apple Silicon MPS (if available) or CPU |
|
|
| ### Tokenizer |
| The model uses the [aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test) tokenizer. |
|
|
| ### Model Architecture |
| - **Architecture Type**: RNN-based language model with GRU cells |
| - **Embedding Dimension**: 512 |
| - **Hidden Dimension**: 1024 |
| - **Vocabulary Size**: 18542 |
| - **Architecture Diagram**: See `model_arch.jpg` for visual representation |
|
|
| ## Files |
| - `model.bin`: The trained model weights in PyTorch format. |
| - `tokenizer.json`: The tokenizer configuration. |
| - `model_arch.jpg`: Architecture diagram showing the GRU model structure. |
|
|
| ## How to Use |
|
|
| Since this is a custom model, you'll need to load it using the provided code: |
|
|
| ```python |
| import torch |
| from your_language_model import LanguageModel # Replace with actual import |
| from tokenizers import Tokenizer |
| |
| # Load tokenizer |
| tokenizer = Tokenizer.from_file("tokenizer.json") |
| |
| # Load model |
| vocab_size = tokenizer.get_vocab_size() |
| model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024) |
| model.load_state_dict(torch.load("model.bin")) |
| model.eval() |
| |
| # Generate text |
| input_text = "Once upon a time" |
| # Tokenize and generate (implement your generation logic) |
| ``` |
|
|
| ## Limitations |
| - This is a basic RNN model and may not perform as well as transformer-based models. |
| - Trained on limited data, may exhibit biases from the training dataset. |
| - Not optimized for production deployment. |
|
|
| ## Ethical Considerations |
| Users should be aware of potential biases in generated text and use the model responsibly. |
|
|
| ## Citation |
| If you use this model, please cite: |
| ``` |
| @misc{vanilla-rnn-gru-like}, |
| title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}, |
| author={Aditya Wath}, |
| year={2024}, |
| publisher={Hugging Face}, |
| url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding} |
| } |
| ``` |