PiperSR-2x / README.md
ModelPiper's picture
Upload README.md with huggingface_hub
cff3e76 verified
---
library_name: coreml
pipeline_tag: image-to-image
tags:
- super-resolution
- apple-silicon
- neural-engine
- ane
- coreml
- real-time
- video-upscaling
- macos
license: apache-2.0
datasets:
- eugenesiow/Div2k
metrics:
- psnr
- ssim
model-index:
- name: PiperSR-2x
results:
- task:
type: image-super-resolution
name: Image Super-Resolution
dataset:
type: Set5
name: Set5
metrics:
- type: psnr
value: 37.54
name: PSNR
- task:
type: image-super-resolution
name: Image Super-Resolution
dataset:
type: Set14
name: Set14
metrics:
- type: psnr
value: 33.21
name: PSNR
- task:
type: image-super-resolution
name: Image Super-Resolution
dataset:
type: BSD100
name: BSD100
metrics:
- type: psnr
value: 31.98
name: PSNR
- task:
type: image-super-resolution
name: Image Super-Resolution
dataset:
type: Urban100
name: Urban100
metrics:
- type: psnr
value: 31.38
name: PSNR
---
# PiperSR-2x: ANE-Native Super Resolution for Apple Silicon
Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.
Not a converted PyTorch model β€” an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.
## Key Results
| Model | Params | Set5 | Set14 | BSD100 | Urban100 |
|-------|--------|------|-------|--------|----------|
| Bicubic | β€” | 33.66 | 30.24 | 29.56 | 26.88 |
| FSRCNN | 13K | 37.05 | 32.66 | 31.53 | 29.88 |
| **PiperSR** | **453K** | **37.54** | **33.21** | **31.98** | **31.38** |
| SAFMN | 228K | 38.00 | ~33.7 | ~32.2 | β€” |
Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 β€” below the perceptual threshold for most content.
## Performance
| Configuration | FPS | Hardware | Notes |
|--------------|-----|----------|-------|
| Full-frame 640Γ—360 β†’ 1280Γ—720 | 44.4 | M2 Max | ANE predict 20.8 ms |
| 128Γ—128 tiles (static weights) | 125.6 | M2 | Baked weights, 2.82Γ— vs dynamic |
| 128Γ—128 tiles (dynamic weights) | 44.5 | M2 | CoreML default |
Real-time 2Γ— upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback β€” PiperSR puts it to work.
## Architecture
453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.
```
Input (128Γ—128Γ—3 FP16)
β†’ Head: Conv 3Γ—3 (3 β†’ 64)
β†’ Body: 6Γ— ResBlock [Conv 3Γ—3 β†’ BatchNorm β†’ SiLU β†’ Conv 3Γ—3 β†’ BatchNorm β†’ Residual Add]
β†’ Tail: Conv 3Γ—3 (64 β†’ 12) β†’ PixelShuffle(2)
Output (256Γ—256Γ—3)
```
Compiles to 5 MIL ops: `conv`, `add`, `silu`, `pixel_shuffle`, `const`. All verified ANE-native.
### Why ANE-native matters
Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:
- **Misaligned channels** (48 instead of 64) waste 25%+ of each ANE tile
- **Monolithic full-frame** tensors serialize the ANE's parallel compute lanes
- **Silent CPU fallback** from unsupported ops can 5-10Γ— latency
- **No batched tiles** means 60Γ— dispatch overhead
PiperSR addresses every one of these by designing around ANE constraints.
## Model Variants
| File | Use Case | Input β†’ Output |
|------|----------|----------------|
| `PiperSR_2x.mlpackage` | Static images (128px tiles) | 128Γ—128 β†’ 256Γ—256 |
| `PiperSR_2x_video_720p.mlpackage` | Video (full-frame, BN-fused) | 640Γ—360 β†’ 1280Γ—720 |
| `PiperSR_2x_256.mlpackage` | Static images (256px tiles) | 256Γ—256 β†’ 512Γ—512 |
## Usage
### With ToolPiper (recommended)
PiperSR is integrated into [ToolPiper](https://modelpiper.com), a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.
```bash
# Via MCP tool
mcp__toolpiper__image_upscale image=/path/to/image.png
# Via REST API
curl -X POST http://127.0.0.1:9998/v1/images/upscale \
-F "image=@input.png" \
-o upscaled.png
```
### With CoreML (Swift)
```swift
import CoreML
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine // NOT .all β€” .all is 23.6% slower
let model = try PiperSR_2x(configuration: config)
let input = try PiperSR_2xInput(x: pixelBuffer)
let output = try model.prediction(input: input)
// output.var_185 contains the 2Γ— upscaled image
```
> **Important:** Use `.cpuAndNeuralEngine`, not `.all`. CoreML's `.all` silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.
### With coremltools (Python)
```python
import coremltools as ct
from PIL import Image
import numpy as np
model = ct.models.MLModel("PiperSR_2x.mlpackage")
img = Image.open("input.png").resize((128, 128))
arr = np.array(img).astype(np.float32) / 255.0
arr = np.transpose(arr, (2, 0, 1))[np.newaxis] # NCHW
result = model.predict({"x": arr})
```
## Training
Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.
## Technical Details
- **Compute units:** `.cpuAndNeuralEngine` (ANE primary, CPU for I/O only)
- **Precision:** Float16
- **Input format:** NCHW, normalized to [0, 1]
- **Output format:** NCHW, [0, 1]
- **Model size:** 928 KB (compiled .mlmodelc)
- **Parameters:** 453K
- **ANE ops used:** conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
- **CPU fallback ops:** None
## License
Apache 2.0
## Citation
```bibtex
@software{pipersr2025,
title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
author={ModelPiper},
year={2025},
url={https://huggingface.co/ModelPiper/PiperSR-2x}
}
```