File size: 1,121 Bytes
f7be1db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: llama2
language:
- en
tags:
- tokenizer
- llama2
- infigram
---

# Llama-2 Tokenizer (Mirror for KnowRL Project)

This is a mirror of the tokenizer files from `meta-llama/Llama-2-7b-hf`, provided as a **public, gated-free** alternative for users who cannot access the original gated repo.

## Why this mirror exists

The KnowRL project's QuCo reward function uses an Infini-gram index built with the Llama-2 tokenizer. To query the index, the exact same tokenizer is required. Since `meta-llama/Llama-2-7b-hf` is gated, users without approved access cannot run QuCo.

This repo contains **only the tokenizer files** (no model weights):
- `tokenizer.json` — fast tokenizer
- `tokenizer.model` — SentencePiece model
- `tokenizer_config.json` — tokenizer configuration
- `special_tokens_map.json` — special token mappings

## Usage

```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("UIC-R2-lab/llama2-tokenizer")
```

## License

Follows the original [Llama 2 Community License Agreement](https://github.com/facebookresearch/llama/blob/main/LICENSE) from Meta.