Initial upload: Llama-2 tokenizer files (mirror for KnowRL)

f7be1db verified 21 days ago

1.12 kB

	---
	license: llama2
	language:
	- en
	tags:
	- tokenizer
	- llama2
	- infigram
	---

	# Llama-2 Tokenizer (Mirror for KnowRL Project)

	This is a mirror of the tokenizer files from `meta-llama/Llama-2-7b-hf`, provided as a public, gated-free alternative for users who cannot access the original gated repo.

	## Why this mirror exists

	The KnowRL project's QuCo reward function uses an Infini-gram index built with the Llama-2 tokenizer. To query the index, the exact same tokenizer is required. Since `meta-llama/Llama-2-7b-hf` is gated, users without approved access cannot run QuCo.

	This repo contains only the tokenizer files (no model weights):
	- `tokenizer.json` — fast tokenizer
	- `tokenizer.model` — SentencePiece model
	- `tokenizer_config.json` — tokenizer configuration
	- `special_tokens_map.json` — special token mappings

	## Usage

	```python
	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("UIC-R2-lab/llama2-tokenizer")
	```

	## License

	Follows the original [Llama 2 Community License Agreement](https://github.com/facebookresearch/llama/blob/main/LICENSE) from Meta.