RSLM Tokenizer 65K
CPU-safe Byte-Level BPE tokenizer for RSLM.
Training data
Dataset: turkish-nlp-suite/BellaTurca
Subsets:
AkademikDerlemOzenliDerlemtemiz-OSCARtemiz-mC4
Column: text
Target estimated tokens: 700,000,000 total, approximately 175,000,000 per subset.
Vocab
- Requested vocab size:
65,536 - Actual vocab size:
65,536 - BPE min frequency:
3
Special tokens
<|pad|><|bos|><|eos|><|unk|><|system|><|user|><|assistant|><|answer|><|end|><think></think>
- Downloads last month
- 80
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support