CompactAI-O
/

ZipPlus

Model card Files Files and versions

ZipPlus / README.md

CompactAI's picture

Update README.md

c0bd9f9 verified 7 days ago

|

history blame contribute delete

2.18 kB

	---
	license: mit
	---
	# ZipPlus Model Card

	A pre-trained 4-layer GRU model for neural file compression. Each compressed file contains its own adapted model — no external model needed to decompress.

	This is a pre-trained ByteGRU model for [Zip+](https://github.com/CompactAIOfficial/ZipPlus).

	## What is this

	Zip+ compresses any file into a PNG image using a neural network (GRU + range coding). Each compressed file embeds its own adapted model:

	```
	file.txt → [ByteGRU + Range Coding] → file.txt.zpng.png → [embedded model] → file.txt
	```

	Every PNG is self-contained — decompress even if you lose the original model file!

	## Model Details

	- Architecture: 4-layer GRU over byte embeddings
	- Embedding dim: 64 → Hidden dim: 512
	- Trained on: FineWeb-Edu (10BT of educational web text) + adaptive per-file training
	- Entropy coding: Range coding via Constriction
	- Output format: PNG where payload + model live in RGB pixel bytes
	- Magic header: `ZPNG` (first 4 bytes)

	## Requirements

	- Python 3.10+
	- PyTorch (CUDA recommended)
	- Constriction (`pip install constriction`)
	- Pillow
	- numpy
	- huggingface_hub

	```bash
	pip install torch constriction pillow numpy huggingface_hub
	```

	## Quick Start

	### Compress a file (with auto-adaptation)

	```bash
	python inference.py compress myfile.txt -o myfile.zpng.png
	```

	- Automatically adapts model to your file (50 steps)
	- Embeds adapted model in PNG for self-contained decoding

	### Decompress

	```bash
	python inference.py decompress myfile.zpng.png -o restored.txt
	```

	Loads the model embedded in the PNG — no external files needed!

	### Training (optional)

	```bash
	python train.py --grid 128 --steps 10000
	```

	Auto-downloads FineWeb-Edu if no corpus specified.

	## Performance

	- Text files: ~5-20% of original size
	- Works best on files > 10KB
	- Smaller files: embedding overhead (~21MB) may exceed compression gains

	## Warnings

	- Embedding adds ~21MB to output — worth it for large files
	- GPU recommended for training and compression
	- Lossless — verified via SHA256 checksums

	## License

	MIT. I'm not liable if this eats your thesis/pixels/anything.