aditya-6122
/

Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding

Text Generation

Model card Files Files and versions

Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding / README.md

aditya-6122's picture

Update README.md

06b9882 verified about 19 hours ago

|

history blame contribute delete

3.16 kB

	---
	language: en
	license: mit
	tags:
	- language-model
	- pytorch
	- rnn
	- text-generation
	datasets:
	- aditya-6122/tinystories-custom-dataset-18542-v2-test
	pipeline_tag: text-generation
	---

	# vanilla-rnn-gru-like

	## Model Details

	### Model Description
	This is a custom language model trained on a dataset of short stories, designed for text generation tasks.

	### Model Sources
	- Repository: [GitHub Repository](https://github.com/your-repo) # Replace with actual repo if available
	- Paper: N/A

	## Uses

	### Direct Use
	This model can be used for generating short stories and text completion tasks.

	### Downstream Use
	Fine-tune the model on specific domains for specialized text generation.

	### Out-of-Scope Use
	Not intended for production use without further validation.

	## Training Details

	### Training Data
	The model was trained on the [aditya-6122/tinystories-custom-dataset-18542-v2-test](https://huggingface.co/datasets/aditya-6122/tinystories-custom-dataset-18542-v2-test) dataset.

	### Training Procedure
	- Training Regime: Standard language model training with cross-entropy loss
	- Epochs: 5
	- Batch Size: 128
	- Learning Rate: 0.001
	- Optimizer: Adam (assumed)
	- Hardware: Apple Silicon MPS (if available) or CPU

	### Tokenizer
	The model uses the [aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test](https://huggingface.co/aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test) tokenizer.

	### Model Architecture
	- Architecture Type: RNN-based language model with GRU cells
	- Embedding Dimension: 512
	- Hidden Dimension: 1024
	- Vocabulary Size: 18542
	- Architecture Diagram: See `model_arch.jpg` for visual representation

	## Files
	- `model.bin`: The trained model weights in PyTorch format.
	- `tokenizer.json`: The tokenizer configuration.
	- `model_arch.jpg`: Architecture diagram showing the GRU model structure.

	## How to Use

	Since this is a custom model, you'll need to load it using the provided code:

	```python
	import torch
	from your_language_model import LanguageModel # Replace with actual import
	from tokenizers import Tokenizer

	# Load tokenizer
	tokenizer = Tokenizer.from_file("tokenizer.json")

	# Load model
	vocab_size = tokenizer.get_vocab_size()
	model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
	model.load_state_dict(torch.load("model.bin"))
	model.eval()

	# Generate text
	input_text = "Once upon a time"
	# Tokenize and generate (implement your generation logic)
	```

	## Limitations
	- This is a basic RNN model and may not perform as well as transformer-based models.
	- Trained on limited data, may exhibit biases from the training dataset.
	- Not optimized for production deployment.

	## Ethical Considerations
	Users should be aware of potential biases in generated text and use the model responsibly.

	## Citation
	If you use this model, please cite:
	```
	@misc{vanilla-rnn-gru-like},
	title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding},
	author={Aditya Wath},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}
	}
	```