--- language: en license: mit tags: - language-model - pytorch - rnn - text-generation datasets: - aditya-6122/tinystories-custom-dataset-18542-v2-test pipeline_tag: text-generation --- # vanilla-rnn-gru-like ## Model Details ### Model Description This is a custom language model trained on a dataset of short stories, designed for text generation tasks. ### Model Sources - **Repository**: [GitHub Repository](https://github.com/your-repo) # Replace with actual repo if available - **Paper**: N/A ## Uses ### Direct Use This model can be used for generating short stories and text completion tasks. ### Downstream Use Fine-tune the model on specific domains for specialized text generation. ### Out-of-Scope Use Not intended for production use without further validation. ## Training Details ### Training Data The model was trained on the [aditya-6122/tinystories-custom-dataset-18542-v2-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-18542-v2-test) dataset. ### Training Procedure - **Training Regime**: Standard language model training with cross-entropy loss - **Epochs**: 5 - **Batch Size**: 128 - **Learning Rate**: 0.001 - **Optimizer**: Adam (assumed) - **Hardware**: Apple Silicon MPS (if available) or CPU ### Tokenizer The model uses the [aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test) tokenizer. ### Model Architecture - **Architecture Type**: RNN-based language model with GRU cells - **Embedding Dimension**: 512 - **Hidden Dimension**: 1024 - **Vocabulary Size**: 18542 - **Architecture Diagram**: See `model_arch.jpg` for visual representation ## Files - `model.bin`: The trained model weights in PyTorch format. - `tokenizer.json`: The tokenizer configuration. - `model_arch.jpg`: Architecture diagram showing the GRU model structure. ## How to Use Since this is a custom model, you'll need to load it using the provided code: ```python import torch from your_language_model import LanguageModel # Replace with actual import from tokenizers import Tokenizer # Load tokenizer tokenizer = Tokenizer.from_file("tokenizer.json") # Load model vocab_size = tokenizer.get_vocab_size() model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024) model.load_state_dict(torch.load("model.bin")) model.eval() # Generate text input_text = "Once upon a time" # Tokenize and generate (implement your generation logic) ``` ## Limitations - This is a basic RNN model and may not perform as well as transformer-based models. - Trained on limited data, may exhibit biases from the training dataset. - Not optimized for production deployment. ## Ethical Considerations Users should be aware of potential biases in generated text and use the model responsibly. ## Citation If you use this model, please cite: ``` @misc{vanilla-rnn-gru-like}, title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}, author={Aditya Wath}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding} } ```