vanilla-rnn-gru-like

Model Details

Model Description

This is a custom language model trained on a dataset of short stories, designed for text generation tasks.

Model Sources

Uses

Direct Use

This model can be used for generating short stories and text completion tasks.

Downstream Use

Fine-tune the model on specific domains for specialized text generation.

Out-of-Scope Use

Not intended for production use without further validation.

Training Details

Training Data

The model was trained on the aditya-6122/tinystories-custom-dataset-18542-v2-test dataset.

Training Procedure

  • Training Regime: Standard language model training with cross-entropy loss
  • Epochs: 5
  • Batch Size: 128
  • Learning Rate: 0.001
  • Optimizer: Adam (assumed)
  • Hardware: Apple Silicon MPS (if available) or CPU

Tokenizer

The model uses the aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test tokenizer.

Model Architecture

  • Architecture Type: RNN-based language model with GRU cells
  • Embedding Dimension: 512
  • Hidden Dimension: 1024
  • Vocabulary Size: 18542
  • Architecture Diagram: See model_arch.jpg for visual representation

Files

  • model.bin: The trained model weights in PyTorch format.
  • tokenizer.json: The tokenizer configuration.
  • model_arch.jpg: Architecture diagram showing the GRU model structure.

How to Use

Since this is a custom model, you'll need to load it using the provided code:

import torch
from your_language_model import LanguageModel  # Replace with actual import
from tokenizers import Tokenizer

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Load model
vocab_size = tokenizer.get_vocab_size()
model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
model.load_state_dict(torch.load("model.bin"))
model.eval()

# Generate text
input_text = "Once upon a time"
# Tokenize and generate (implement your generation logic)

Limitations

  • This is a basic RNN model and may not perform as well as transformer-based models.
  • Trained on limited data, may exhibit biases from the training dataset.
  • Not optimized for production deployment.

Ethical Considerations

Users should be aware of potential biases in generated text and use the model responsibly.

Citation

If you use this model, please cite:

@misc{vanilla-rnn-gru-like},
  title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding},
  author={Aditya Wath},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding