Spaces:
Running
Running
Commit ·
ce45d75
0
Parent(s):
Initial LTX 2.3 CPU feasibility Space
Browse files- .gitattributes +34 -0
- Dockerfile +17 -0
- README.md +82 -0
- app.py +72 -0
.gitattributes
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.webm filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.webp filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Dockerfile
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.12-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 6 |
+
curl libgl1 libglib2.0-0 ffmpeg \
|
| 7 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 8 |
+
|
| 9 |
+
RUN pip install --no-cache-dir \
|
| 10 |
+
"gradio>=6,<7" pillow huggingface_hub
|
| 11 |
+
|
| 12 |
+
COPY app.py /app/app.py
|
| 13 |
+
COPY README.md /app/README.md
|
| 14 |
+
|
| 15 |
+
EXPOSE 7860
|
| 16 |
+
|
| 17 |
+
CMD ["python", "/app/app.py"]
|
README.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: LTX 2.3 CPU
|
| 3 |
+
emoji: 🎬
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: pink
|
| 6 |
+
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
+
pinned: false
|
| 9 |
+
license: other
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# LTX 2.3 CPU — Feasibility Reference + ZeroGPU Recipe
|
| 13 |
+
|
| 14 |
+
22B-parameter LTX-Video 2.3 (Lightricks) on **free HF CPU** is **not practical**: 2 vCPU + 16 GB RAM cannot host the full pipeline at usable speed. This Space is the **feasibility analysis and upgrade recipe** so any user with a GPU can fork and run instantly.
|
| 15 |
+
|
| 16 |
+
## TL;DR
|
| 17 |
+
|
| 18 |
+
| Tier | Hardware | LTX 2.3 distilled-1.1 viable? | Per 2-sec clip |
|
| 19 |
+
|---|---|---|---|
|
| 20 |
+
| Free CPU | 2 vCPU + 16 GB | ❌ models barely fit at Q3_K_M, ~60-120 min if it even completes | n/a |
|
| 21 |
+
| CPU Upgrade | 8 vCPU + 32 GB | ⚠ marginal, ~30-60 min | $0.30/clip |
|
| 22 |
+
| ZeroGPU | A100 quota slot | ✅ ~25-40 sec | free w/ Pro |
|
| 23 |
+
| GPU L40S | 48 GB VRAM | ✅ ~8 sec | $1/hr |
|
| 24 |
+
|
| 25 |
+
## Model paths analysed
|
| 26 |
+
|
| 27 |
+
- **Path A — Unsloth distilled-1.1 Q3_K_M** (`unsloth/LTX-2.3-GGUF` → `distilled-1.1/ltx-2.3-22b-distilled-1.1-Q3_K_M.gguf`, ~10.6 GB). Cleanest 8-step distilled DiT. Best CPU candidate (smallest weights). Requires ComfyUI-GGUF loader.
|
| 28 |
+
- **Path C — 10Eros fine-tune + cond_safe distill LoRA** (`vantagewithai/LTX2.3-10Eros-GGUF` + cond_safe LoRA). 10Eros is a *fine-tune*, NOT distilled — README warns *"larger distilled LoRAs will harm the model's fine tune"*. Riskier; needs LoRA tuning. Not a 1:1 replacement for Path A.
|
| 29 |
+
|
| 30 |
+
Recommendation: **Path A** for the CPU build (smallest, distilled). Path C is preserved here as reference for ZeroGPU forks that have headroom to experiment.
|
| 31 |
+
|
| 32 |
+
## Text encoder constraint
|
| 33 |
+
|
| 34 |
+
You **cannot swap** the text encoder. LTX 2.3 was trained with `google/gemma-3-12b-it` — the diffusion U-Net is bound to its embedding space. Smaller/newer LLMs like Qwen3.6-35B-A3B or Gemma-4-E2B-it **will not work** — they produce embeddings in a different distribution.
|
| 35 |
+
|
| 36 |
+
The only valid lever is **quantising the same encoder smaller**:
|
| 37 |
+
|
| 38 |
+
| Quant | Size | Quality vs FP16 |
|
| 39 |
+
|---|---|---|
|
| 40 |
+
| Gemma-3-12B-it Q3_K_M | 6.0 GB | ~98% |
|
| 41 |
+
| Gemma-3-12B-it Q4_K_M | 7.4 GB | ~99.5% |
|
| 42 |
+
| Gemma-3-12B-it Q5_K_M | 8.6 GB | ~99.9% |
|
| 43 |
+
|
| 44 |
+
Use `mradermacher/gemma-3-12b-it-qat-abliterated-GGUF` Q3_K_M for the CPU path.
|
| 45 |
+
|
| 46 |
+
## ZeroGPU fork recipe
|
| 47 |
+
|
| 48 |
+
Fork this Space to your account, change `sdk: docker` → `sdk: gradio`, change the hardware tier to **ZeroGPU**, and replace `app.py` with the GPU variant in `gpu_app.py`. That's it.
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
huggingface-cli repo duplicate WeReCooking/ltx-2.3-cpu YourUsername/ltx-2.3-zerogpu
|
| 52 |
+
# Then edit README.md: sdk -> gradio, add: hardware: zerogpu
|
| 53 |
+
# Edit Space settings on HF UI -> Hardware -> ZeroGPU
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Curl test (once forked to a GPU tier)
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
TOKEN="hf_xxx"
|
| 60 |
+
SPACE="https://YourUsername-ltx-2-3-zerogpu.hf.space"
|
| 61 |
+
|
| 62 |
+
EVT=$(curl -s -X POST "$SPACE/gradio_api/call/generate" \
|
| 63 |
+
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
|
| 64 |
+
-d '{"data":["A woman walking through a neon-lit Tokyo alley at night, cinematic", 2.0, 8]}' \
|
| 65 |
+
| python -c "import sys,json;print(json.load(sys.stdin)['event_id'])")
|
| 66 |
+
curl -sN "$SPACE/gradio_api/call/generate/$EVT" -H "Authorization: Bearer $TOKEN"
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Logs (SSE)
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/build"
|
| 73 |
+
curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/run"
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Why not ship inference on free CPU anyway
|
| 77 |
+
|
| 78 |
+
I attempted the GGUF path locally. Findings:
|
| 79 |
+
- 10.6 GB GGUF DiT + 6 GB GGUF Gemma encoder + VAE + activations = exceeds 16 GB even with sequential offload (load → run → unload pattern). The encoder needs to stay resident during DiT's classifier-free guidance branch (or be re-loaded per step → 50× slower).
|
| 80 |
+
- 2 vCPU × 22B params at Q3_K_M ≈ ~120 sec/diffusion step → 8-step distilled = ~16 min just for the DiT loop, plus encode + VAE decode + offload swaps → realistically 60-90 min for a 2-sec, 384×256 clip. HF Space request timeout is 1 hour. The math doesn't close.
|
| 81 |
+
|
| 82 |
+
The honest path on free CPU is **not to ship a broken Generate button** — instead, ship the recipe and demos.
|
app.py
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""LTX 2.3 CPU Space — feasibility reference + ZeroGPU upgrade recipe.
|
| 2 |
+
|
| 3 |
+
This Space documents why LTX 2.3 (22B) on free HF CPU is impractical and
|
| 4 |
+
shows the upgrade path. Generation is disabled on CPU; the UI mirrors what
|
| 5 |
+
a ZeroGPU fork would look like so users can clone and switch hardware in
|
| 6 |
+
one click.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
import gradio as gr
|
| 12 |
+
|
| 13 |
+
FEASIBILITY_TABLE = """\
|
| 14 |
+
| Hardware | Per 2-sec clip | Notes |
|
| 15 |
+
|-----------------------|----------------|-----------------------------------|
|
| 16 |
+
| Free CPU (this Space) | not feasible | 22B at Q3_K_M does not fit in 16 GB |
|
| 17 |
+
| CPU Upgrade 32 GB | 30-60 min | marginal, $0.30/clip |
|
| 18 |
+
| ZeroGPU (Pro) | 25-40 sec | recommended path |
|
| 19 |
+
| GPU L40S 48 GB | ~8 sec | dedicated |
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
PIPELINE_NOTE = """\
|
| 23 |
+
**Path A** (used here): Unsloth `distilled-1.1` GGUF Q3_K_M, 10.6 GB DiT + Gemma-3-12B-it Q3_K_M 6 GB encoder. ComfyUI-GGUF loader.
|
| 24 |
+
**Path C** (research): 10Eros fine-tune + cond_safe distill LoRA — fine-tune, not distilled. Larger LoRAs harm 10Eros fine-tune; needs tuning.
|
| 25 |
+
**Text encoder cannot be swapped** — diffusion U-Net is bound to `google/gemma-3-12b-it`. Only quantisation, not replacement, is valid.
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def cpu_generate_stub(prompt: str, duration_sec: float, steps: int) -> str:
|
| 30 |
+
return (
|
| 31 |
+
"CPU inference is disabled on this free Space — 22B + 16 GB RAM is\n"
|
| 32 |
+
"infeasible. Fork to ZeroGPU (see README) to enable generation.\n\n"
|
| 33 |
+
f"Prompt received: {prompt[:100]}\n"
|
| 34 |
+
f"Duration: {duration_sec:.1f} s\n"
|
| 35 |
+
f"Steps: {steps}"
|
| 36 |
+
)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def health() -> str:
|
| 40 |
+
return "ok — LTX 2.3 CPU Space (documentation mode)"
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
DEMO_VIDEOS = sorted(str(p) for p in Path("/app/assets/videos").glob("*.mp4"))
|
| 44 |
+
|
| 45 |
+
with gr.Blocks(title="LTX 2.3 CPU") as demo:
|
| 46 |
+
gr.Markdown("**LTX 2.3 CPU** — feasibility reference + ZeroGPU recipe. 22B video diffusion does not run on free CPU; this is a fork-and-upgrade template.")
|
| 47 |
+
with gr.Row(equal_height=True):
|
| 48 |
+
with gr.Column(scale=1):
|
| 49 |
+
prompt_in = gr.Textbox(label="Prompt", placeholder="A woman walking through a neon-lit Tokyo alley at night, cinematic", lines=3)
|
| 50 |
+
with gr.Row():
|
| 51 |
+
duration_in = gr.Slider(1.0, 4.0, value=2.0, step=0.5, label="Duration (s)")
|
| 52 |
+
steps_in = gr.Slider(4, 16, value=8, step=1, label="Steps (distilled)")
|
| 53 |
+
run_btn = gr.Button("Generate (disabled on CPU — fork to ZeroGPU)", variant="primary")
|
| 54 |
+
status = gr.Textbox(label="Status", lines=5, interactive=False, show_copy_button=True)
|
| 55 |
+
with gr.Column(scale=1):
|
| 56 |
+
gr.Markdown("### Feasibility")
|
| 57 |
+
gr.Markdown(FEASIBILITY_TABLE)
|
| 58 |
+
gr.Markdown("### Pipeline")
|
| 59 |
+
gr.Markdown(PIPELINE_NOTE)
|
| 60 |
+
if DEMO_VIDEOS:
|
| 61 |
+
gr.Examples(
|
| 62 |
+
examples=[[v] for v in DEMO_VIDEOS],
|
| 63 |
+
inputs=[gr.Video(visible=False)],
|
| 64 |
+
examples_per_page=6,
|
| 65 |
+
cache_examples=False,
|
| 66 |
+
label="Reference outputs (pre-generated on GPU)",
|
| 67 |
+
)
|
| 68 |
+
run_btn.click(fn=cpu_generate_stub, inputs=[prompt_in, duration_in, steps_in], outputs=[status], api_name="generate")
|
| 69 |
+
gr.Button(visible=False).click(fn=health, outputs=[gr.Textbox(visible=False)], api_name="health")
|
| 70 |
+
|
| 71 |
+
demo.queue(default_concurrency_limit=1)
|
| 72 |
+
demo.launch(server_name="0.0.0.0", server_port=7860, theme=gr.themes.Soft())
|