Instructions to use WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Cosmos
How to use WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: other | |
| license_name: circlestone-labs-non-commercial-license | |
| base_model: | |
| - circlestone-labs/Anima | |
| pipeline_tag: text-to-image | |
| library_name: diffusers | |
| tags: | |
| - diffusers | |
| - safetensors | |
| - sdnq | |
| - anima | |
| - cosmos | |
| - text-to-image | |
| - uint4 | |
| # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint | |
| 4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the smallest checkpoint and lowest VRAM footprint in this comparison; the companion checkpoints are listed in the benchmark table below. | |
| This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer converted from `WaveCut/Anima-Preview-3-SDNQ-uint4`. | |
| ## Components | |
| - `transformer/`: SDNQ `uint4` quantized `CosmosTransformer3DModel`. | |
| - `llm_adapter/`: Anima LLM adapter required by the native Anima architecture. | |
| - `text_encoder/`: Qwen3 0.6B text encoder from the Diffusers conversion. | |
| - `tokenizer/` and `t5_tokenizer/`: Qwen and T5 tokenizers used by the adapter pathway. | |
| - `vae/`: Qwen Image / Wan-style VAE used by Anima. | |
| - `scheduler/`: `FlowMatchEulerDiscreteScheduler` with shift 3.0. | |
| ## Usage | |
| Install current Diffusers/Transformers plus SDNQ support, then load the pipeline: | |
| ```python | |
| import torch | |
| import sdnq | |
| from diffusers import DiffusionPipeline | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers", | |
| custom_pipeline="pipeline", | |
| torch_dtype=torch.bfloat16, | |
| trust_remote_code=True, | |
| ).to("cuda") | |
| prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer" | |
| negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name" | |
| image = pipe( | |
| prompt=prompt, | |
| negative_prompt=negative_prompt, | |
| width=1024, | |
| height=1024, | |
| num_inference_steps=30, | |
| guidance_scale=4.0, | |
| generator=torch.Generator(device="cuda").manual_seed(424242), | |
| ).images[0] | |
| ``` | |
| Because the Anima pipeline is custom code, pass `custom_pipeline="pipeline"`; `trust_remote_code=True` allows Diffusers to load `pipeline.py` from this repo. | |
| ## Prompting | |
| Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example `1024x1024`, `896x1152`, or `1152x896`, with roughly 30-50 steps and CFG 4-5. | |
| Recommended positive prefix: | |
| ```text | |
| masterpiece, best quality, score_7, safe, | |
| ``` | |
| Recommended negative prompt: | |
| ```text | |
| worst quality, low quality, score_1, score_2, score_3, artist name | |
| ``` | |
| Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`. | |
| ## 1024x1024 Comparison Grid | |
| Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, this UINT4 checkpoint, and the companion INT8 checkpoint. The source JPEG is `3572x5576`; every generated cell is exactly `1024x1024` and pasted 1:1 with no resizing. | |
|  | |
| Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in [`benchmarks/benchmark_results_1024.json`](benchmarks/benchmark_results_1024.json). | |
| ## Benchmark | |
| Measured on an RTX 5090 32GB with `torch 2.8.0+cu128`, `diffusers 0.38.0`, `transformers 5.8.1`, `sdnq 0.1.8`, `torch.bfloat16`, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with `nvidia-smi` every 50 ms. | |
| | Model | Repo | Size | Load time | Mean generation | Speed vs original | VRAM after load | Peak VRAM while generating | | |
| | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | Original BF16 | `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers` | 5.3 GiB | 10.04s | 6.37s/img | 1.00x | 6005 MiB | 10759 MiB | | |
| | SDNQ UINT4 | `WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers` | 2.7 GiB (-49.1%) | 11.96s | 6.13s/img | 1.04x (+3.9%) | 3285 MiB (-45.3%) | 8157 MiB (-24.2%) | | |
| | SDNQ INT8 | `WaveCut/Anima-Preview-3-SDNQ-int8-diffusers` | 3.5 GiB (-34.1%) | 22.41s | 4.60s/img | 1.38x (+38.4%) | 4111 MiB (-31.5%) | 8961 MiB (-16.7%) | | |
| Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup. | |
| ## Notes | |
| The original Anima split checkpoint is a ComfyUI-native model with a Qwen3 text encoder and a learned LLM adapter. Earlier transformer-only exports that load the checkpoint directly as `CosmosTransformer3DModel` ignore the `llm_adapter.*` weights; this repo keeps the adapter and full pipeline structure so generation follows the Anima architecture. | |
| License follows the upstream Anima/CircleStone non-commercial license and the NVIDIA Cosmos derivative terms referenced by the upstream model card. | |