ZipPlus / README.md
CompactAI's picture
Update README.md
c0bd9f9 verified
---
license: mit
---
# ZipPlus Model Card
**A pre-trained 4-layer GRU model for neural file compression. Each compressed file contains its own adapted model β€” no external model needed to decompress.**
This is a pre-trained ByteGRU model for [Zip+](https://github.com/CompactAIOfficial/ZipPlus).
## What is this
Zip+ compresses any file into a PNG image using a neural network (GRU + range coding). Each compressed file embeds its own adapted model:
```
file.txt β†’ [ByteGRU + Range Coding] β†’ file.txt.zpng.png β†’ [embedded model] β†’ file.txt
```
**Every PNG is self-contained** β€” decompress even if you lose the original model file!
## Model Details
- **Architecture**: 4-layer GRU over byte embeddings
- **Embedding dim**: 64 β†’ Hidden dim: 512
- **Trained on**: FineWeb-Edu (10BT of educational web text) + adaptive per-file training
- **Entropy coding**: Range coding via Constriction
- **Output format**: PNG where payload + model live in RGB pixel bytes
- **Magic header**: `ZPNG` (first 4 bytes)
## Requirements
- Python 3.10+
- PyTorch (CUDA recommended)
- Constriction (`pip install constriction`)
- Pillow
- numpy
- huggingface_hub
```bash
pip install torch constriction pillow numpy huggingface_hub
```
## Quick Start
### Compress a file (with auto-adaptation)
```bash
python inference.py compress myfile.txt -o myfile.zpng.png
```
- Automatically adapts model to your file (50 steps)
- Embeds adapted model in PNG for self-contained decoding
### Decompress
```bash
python inference.py decompress myfile.zpng.png -o restored.txt
```
Loads the model embedded in the PNG β€” no external files needed!
### Training (optional)
```bash
python train.py --grid 128 --steps 10000
```
Auto-downloads FineWeb-Edu if no corpus specified.
## Performance
- Text files: ~5-20% of original size
- Works best on files > 10KB
- Smaller files: embedding overhead (~21MB) may exceed compression gains
## Warnings
- **Embedding adds ~21MB** to output β€” worth it for large files
- **GPU recommended** for training and compression
- **Lossless** β€” verified via SHA256 checksums
## License
MIT. I'm not liable if this eats your thesis/pixels/anything.