| license: llama2 | |
| language: | |
| - en | |
| tags: | |
| - tokenizer | |
| - llama2 | |
| - infigram | |
| # Llama-2 Tokenizer (Mirror for KnowRL Project) | |
| This is a mirror of the tokenizer files from `meta-llama/Llama-2-7b-hf`, provided as a **public, gated-free** alternative for users who cannot access the original gated repo. | |
| ## Why this mirror exists | |
| The KnowRL project's QuCo reward function uses an Infini-gram index built with the Llama-2 tokenizer. To query the index, the exact same tokenizer is required. Since `meta-llama/Llama-2-7b-hf` is gated, users without approved access cannot run QuCo. | |
| This repo contains **only the tokenizer files** (no model weights): | |
| - `tokenizer.json` — fast tokenizer | |
| - `tokenizer.model` — SentencePiece model | |
| - `tokenizer_config.json` — tokenizer configuration | |
| - `special_tokens_map.json` — special token mappings | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("UIC-R2-lab/llama2-tokenizer") | |
| ``` | |
| ## License | |
| Follows the original [Llama 2 Community License Agreement](https://github.com/facebookresearch/llama/blob/main/LICENSE) from Meta. | |