metadata
language: en
license: mit
tags:
- language-model
- pytorch
- rnn
- text-generation
datasets:
- aditya-6122/tinystories-custom-dataset-18542-v2-test
pipeline_tag: text-generation
vanilla-rnn-gru-like
Model Details
Model Description
This is a custom language model trained on a dataset of short stories, designed for text generation tasks.
Model Sources
- Repository: GitHub Repository # Replace with actual repo if available
- Paper: N/A
Uses
Direct Use
This model can be used for generating short stories and text completion tasks.
Downstream Use
Fine-tune the model on specific domains for specialized text generation.
Out-of-Scope Use
Not intended for production use without further validation.
Training Details
Training Data
The model was trained on the aditya-6122/tinystories-custom-dataset-18542-v2-test dataset.
Training Procedure
- Training Regime: Standard language model training with cross-entropy loss
- Epochs: 5
- Batch Size: 128
- Learning Rate: 0.001
- Optimizer: Adam (assumed)
- Hardware: Apple Silicon MPS (if available) or CPU
Tokenizer
The model uses the aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test tokenizer.
Model Architecture
- Architecture Type: RNN-based language model with GRU cells
- Embedding Dimension: 512
- Hidden Dimension: 1024
- Vocabulary Size: 18542
- Architecture Diagram: See
model_arch.jpgfor visual representation
Files
model.bin: The trained model weights in PyTorch format.tokenizer.json: The tokenizer configuration.model_arch.jpg: Architecture diagram showing the GRU model structure.
How to Use
Since this is a custom model, you'll need to load it using the provided code:
import torch
from your_language_model import LanguageModel # Replace with actual import
from tokenizers import Tokenizer
# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
# Load model
vocab_size = tokenizer.get_vocab_size()
model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
model.load_state_dict(torch.load("model.bin"))
model.eval()
# Generate text
input_text = "Once upon a time"
# Tokenize and generate (implement your generation logic)
Limitations
- This is a basic RNN model and may not perform as well as transformer-based models.
- Trained on limited data, may exhibit biases from the training dataset.
- Not optimized for production deployment.
Ethical Considerations
Users should be aware of potential biases in generated text and use the model responsibly.
Citation
If you use this model, please cite:
@misc{vanilla-rnn-gru-like},
title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding},
author={Aditya Wath},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}
}