Upload README.md with huggingface_hub

cff3e76 verified 12 days ago

6.18 kB

	---
	library_name: coreml
	pipeline_tag: image-to-image
	tags:
	- super-resolution
	- apple-silicon
	- neural-engine
	- ane
	- coreml
	- real-time
	- video-upscaling
	- macos
	license: apache-2.0
	datasets:
	- eugenesiow/Div2k
	metrics:
	- psnr
	- ssim
	model-index:
	- name: PiperSR-2x
	results:
	- task:
	type: image-super-resolution
	name: Image Super-Resolution
	dataset:
	type: Set5
	name: Set5
	metrics:
	- type: psnr
	value: 37.54
	name: PSNR
	- task:
	type: image-super-resolution
	name: Image Super-Resolution
	dataset:
	type: Set14
	name: Set14
	metrics:
	- type: psnr
	value: 33.21
	name: PSNR
	- task:
	type: image-super-resolution
	name: Image Super-Resolution
	dataset:
	type: BSD100
	name: BSD100
	metrics:
	- type: psnr
	value: 31.98
	name: PSNR
	- task:
	type: image-super-resolution
	name: Image Super-Resolution
	dataset:
	type: Urban100
	name: Urban100
	metrics:
	- type: psnr
	value: 31.38
	name: PSNR
	---

	# PiperSR-2x: ANE-Native Super Resolution for Apple Silicon

	Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.

	Not a converted PyTorch model — an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.

	## Key Results

	\| Model \| Params \| Set5 \| Set14 \| BSD100 \| Urban100 \|
	\|-------\|--------\|------\|-------\|--------\|----------\|
	\| Bicubic \| — \| 33.66 \| 30.24 \| 29.56 \| 26.88 \|
	\| FSRCNN \| 13K \| 37.05 \| 32.66 \| 31.53 \| 29.88 \|
	\| PiperSR \| 453K \| 37.54 \| 33.21 \| 31.98 \| 31.38 \|
	\| SAFMN \| 228K \| 38.00 \| ~33.7 \| ~32.2 \| — \|

	Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 — below the perceptual threshold for most content.

	## Performance

	\| Configuration \| FPS \| Hardware \| Notes \|
	\|--------------\|-----\|----------\|-------\|
	\| Full-frame 640×360 → 1280×720 \| 44.4 \| M2 Max \| ANE predict 20.8 ms \|
	\| 128×128 tiles (static weights) \| 125.6 \| M2 \| Baked weights, 2.82× vs dynamic \|
	\| 128×128 tiles (dynamic weights) \| 44.5 \| M2 \| CoreML default \|

	Real-time 2× upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback — PiperSR puts it to work.

	## Architecture

	453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.

	```
	Input (128×128×3 FP16)
	→ Head: Conv 3×3 (3 → 64)
	→ Body: 6× ResBlock [Conv 3×3 → BatchNorm → SiLU → Conv 3×3 → BatchNorm → Residual Add]
	→ Tail: Conv 3×3 (64 → 12) → PixelShuffle(2)
	Output (256×256×3)
	```

	Compiles to 5 MIL ops: `conv`, `add`, `silu`, `pixel_shuffle`, `const`. All verified ANE-native.

	### Why ANE-native matters

	Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:

	- Misaligned channels (48 instead of 64) waste 25%+ of each ANE tile
	- Monolithic full-frame tensors serialize the ANE's parallel compute lanes
	- Silent CPU fallback from unsupported ops can 5-10× latency
	- No batched tiles means 60× dispatch overhead

	PiperSR addresses every one of these by designing around ANE constraints.

	## Model Variants

	\| File \| Use Case \| Input → Output \|
	\|------\|----------\|----------------\|
	\| `PiperSR_2x.mlpackage` \| Static images (128px tiles) \| 128×128 → 256×256 \|
	\| `PiperSR_2x_video_720p.mlpackage` \| Video (full-frame, BN-fused) \| 640×360 → 1280×720 \|
	\| `PiperSR_2x_256.mlpackage` \| Static images (256px tiles) \| 256×256 → 512×512 \|

	## Usage

	### With ToolPiper (recommended)

	PiperSR is integrated into [ToolPiper](https://modelpiper.com), a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.

	```bash
	# Via MCP tool
	mcp__toolpiper__image_upscale image=/path/to/image.png

	# Via REST API
	curl -X POST http://127.0.0.1:9998/v1/images/upscale \
	-F "image=@input.png" \
	-o upscaled.png
	```

	### With CoreML (Swift)

	```swift
	import CoreML

	let config = MLModelConfiguration()
	config.computeUnits = .cpuAndNeuralEngine // NOT .all — .all is 23.6% slower

	let model = try PiperSR_2x(configuration: config)
	let input = try PiperSR_2xInput(x: pixelBuffer)
	let output = try model.prediction(input: input)
	// output.var_185 contains the 2× upscaled image
	```

	> Important: Use `.cpuAndNeuralEngine`, not `.all`. CoreML's `.all` silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.

	### With coremltools (Python)

	```python
	import coremltools as ct
	from PIL import Image
	import numpy as np

	model = ct.models.MLModel("PiperSR_2x.mlpackage")

	img = Image.open("input.png").resize((128, 128))
	arr = np.array(img).astype(np.float32) / 255.0
	arr = np.transpose(arr, (2, 0, 1))[np.newaxis] # NCHW

	result = model.predict({"x": arr})
	```

	## Training

	Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.

	## Technical Details

	- Compute units: `.cpuAndNeuralEngine` (ANE primary, CPU for I/O only)
	- Precision: Float16
	- Input format: NCHW, normalized to [0, 1]
	- Output format: NCHW, [0, 1]
	- Model size: 928 KB (compiled .mlmodelc)
	- Parameters: 453K
	- ANE ops used: conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
	- CPU fallback ops: None

	## License

	Apache 2.0

	## Citation

	```bibtex
	@software{pipersr2025,
	title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
	author={ModelPiper},
	year={2025},
	url={https://huggingface.co/ModelPiper/PiperSR-2x}
	}
	```