Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding

Model Details

Model Description

This is a custom language model trained on a dataset of short stories, designed for text generation tasks.

Architecture

Model Sources

Repository: Aditya6122/BuildingLanguageModel-TinyStories

Uses

Direct Use

This model can be used for generating short stories and text completion tasks.

Downstream Use

Fine-tune the model on specific domains for specialized text generation.

Out-of-Scope Use

Not intended for production use without further validation.

Training Details

Training Data

The model was trained on the aditya-6122/tinystories-custom-dataset-18542-v2-test dataset.

Training Procedure

Training Regime: Standard language model training with cross-entropy loss
Epochs: 5
Batch Size: 128
Learning Rate: 0.001
Optimizer: Adam (assumed)
Hardware: Apple Silicon MPS (if available) or CPU

Tokenizer

The model uses the aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test tokenizer.

Model Architecture

Architecture Type: RNN-based language model with GRU cells
Embedding Dimension: 512
Hidden Dimension: 1024
Vocabulary Size: 18542
Architecture Diagram: See model_arch.jpg for visual representation

Files

model.bin: The trained model weights in PyTorch format.
tokenizer.json: The tokenizer configuration.
model_arch.jpg: Architecture diagram showing the GRU model structure.

How to Use

Since this is a custom model, you'll need to load it using the provided code:

import torch
from your_language_model import LanguageModel  # Replace with actual import
from tokenizers import Tokenizer

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Load model
vocab_size = tokenizer.get_vocab_size()
model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
model.load_state_dict(torch.load("model.bin"))
model.eval()

# Generate text
input_text = "Once upon a time"

# Tokenize and generate [Add your Generation Logic]

Limitations

This is a basic RNN model and may not perform as well as transformer-based models.
Trained on limited data, may exhibit biases from the training dataset.
Not optimized for production deployment.

Ethical Considerations

Users should be aware of potential biases in generated text and use the model responsibly.

Citation

If you use this model, please cite:

@misc{vanilla-rnn-gru-like},
  title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding},
  author={Aditya Wath},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

aditya-6122
/

Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding