Nekochu commited on
Commit
ce45d75
·
0 Parent(s):

Initial LTX 2.3 CPU feasibility Space

Browse files
Files changed (4) hide show
  1. .gitattributes +34 -0
  2. Dockerfile +17 -0
  3. README.md +82 -0
  4. app.py +72 -0
.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gguf filter=lfs diff=lfs merge=lfs -text
8
+ *.gz filter=lfs diff=lfs merge=lfs -text
9
+ *.h5 filter=lfs diff=lfs merge=lfs -text
10
+ *.jpg filter=lfs diff=lfs merge=lfs -text
11
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
12
+ *.npy filter=lfs diff=lfs merge=lfs -text
13
+ *.npz filter=lfs diff=lfs merge=lfs -text
14
+ *.onnx filter=lfs diff=lfs merge=lfs -text
15
+ *.parquet filter=lfs diff=lfs merge=lfs -text
16
+ *.pb filter=lfs diff=lfs merge=lfs -text
17
+ *.pickle filter=lfs diff=lfs merge=lfs -text
18
+ *.pkl filter=lfs diff=lfs merge=lfs -text
19
+ *.png filter=lfs diff=lfs merge=lfs -text
20
+ *.pt filter=lfs diff=lfs merge=lfs -text
21
+ *.pth filter=lfs diff=lfs merge=lfs -text
22
+ *.rar filter=lfs diff=lfs merge=lfs -text
23
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
24
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
25
+ *.tar filter=lfs diff=lfs merge=lfs -text
26
+ *.tflite filter=lfs diff=lfs merge=lfs -text
27
+ *.tgz filter=lfs diff=lfs merge=lfs -text
28
+ *.wasm filter=lfs diff=lfs merge=lfs -text
29
+ *.webm filter=lfs diff=lfs merge=lfs -text
30
+ *.webp filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y --no-install-recommends \
6
+ curl libgl1 libglib2.0-0 ffmpeg \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ RUN pip install --no-cache-dir \
10
+ "gradio>=6,<7" pillow huggingface_hub
11
+
12
+ COPY app.py /app/app.py
13
+ COPY README.md /app/README.md
14
+
15
+ EXPOSE 7860
16
+
17
+ CMD ["python", "/app/app.py"]
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LTX 2.3 CPU
3
+ emoji: 🎬
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: other
10
+ ---
11
+
12
+ # LTX 2.3 CPU — Feasibility Reference + ZeroGPU Recipe
13
+
14
+ 22B-parameter LTX-Video 2.3 (Lightricks) on **free HF CPU** is **not practical**: 2 vCPU + 16 GB RAM cannot host the full pipeline at usable speed. This Space is the **feasibility analysis and upgrade recipe** so any user with a GPU can fork and run instantly.
15
+
16
+ ## TL;DR
17
+
18
+ | Tier | Hardware | LTX 2.3 distilled-1.1 viable? | Per 2-sec clip |
19
+ |---|---|---|---|
20
+ | Free CPU | 2 vCPU + 16 GB | ❌ models barely fit at Q3_K_M, ~60-120 min if it even completes | n/a |
21
+ | CPU Upgrade | 8 vCPU + 32 GB | ⚠ marginal, ~30-60 min | $0.30/clip |
22
+ | ZeroGPU | A100 quota slot | ✅ ~25-40 sec | free w/ Pro |
23
+ | GPU L40S | 48 GB VRAM | ✅ ~8 sec | $1/hr |
24
+
25
+ ## Model paths analysed
26
+
27
+ - **Path A — Unsloth distilled-1.1 Q3_K_M** (`unsloth/LTX-2.3-GGUF` → `distilled-1.1/ltx-2.3-22b-distilled-1.1-Q3_K_M.gguf`, ~10.6 GB). Cleanest 8-step distilled DiT. Best CPU candidate (smallest weights). Requires ComfyUI-GGUF loader.
28
+ - **Path C — 10Eros fine-tune + cond_safe distill LoRA** (`vantagewithai/LTX2.3-10Eros-GGUF` + cond_safe LoRA). 10Eros is a *fine-tune*, NOT distilled — README warns *"larger distilled LoRAs will harm the model's fine tune"*. Riskier; needs LoRA tuning. Not a 1:1 replacement for Path A.
29
+
30
+ Recommendation: **Path A** for the CPU build (smallest, distilled). Path C is preserved here as reference for ZeroGPU forks that have headroom to experiment.
31
+
32
+ ## Text encoder constraint
33
+
34
+ You **cannot swap** the text encoder. LTX 2.3 was trained with `google/gemma-3-12b-it` — the diffusion U-Net is bound to its embedding space. Smaller/newer LLMs like Qwen3.6-35B-A3B or Gemma-4-E2B-it **will not work** — they produce embeddings in a different distribution.
35
+
36
+ The only valid lever is **quantising the same encoder smaller**:
37
+
38
+ | Quant | Size | Quality vs FP16 |
39
+ |---|---|---|
40
+ | Gemma-3-12B-it Q3_K_M | 6.0 GB | ~98% |
41
+ | Gemma-3-12B-it Q4_K_M | 7.4 GB | ~99.5% |
42
+ | Gemma-3-12B-it Q5_K_M | 8.6 GB | ~99.9% |
43
+
44
+ Use `mradermacher/gemma-3-12b-it-qat-abliterated-GGUF` Q3_K_M for the CPU path.
45
+
46
+ ## ZeroGPU fork recipe
47
+
48
+ Fork this Space to your account, change `sdk: docker` → `sdk: gradio`, change the hardware tier to **ZeroGPU**, and replace `app.py` with the GPU variant in `gpu_app.py`. That's it.
49
+
50
+ ```bash
51
+ huggingface-cli repo duplicate WeReCooking/ltx-2.3-cpu YourUsername/ltx-2.3-zerogpu
52
+ # Then edit README.md: sdk -> gradio, add: hardware: zerogpu
53
+ # Edit Space settings on HF UI -> Hardware -> ZeroGPU
54
+ ```
55
+
56
+ ## Curl test (once forked to a GPU tier)
57
+
58
+ ```bash
59
+ TOKEN="hf_xxx"
60
+ SPACE="https://YourUsername-ltx-2-3-zerogpu.hf.space"
61
+
62
+ EVT=$(curl -s -X POST "$SPACE/gradio_api/call/generate" \
63
+ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
64
+ -d '{"data":["A woman walking through a neon-lit Tokyo alley at night, cinematic", 2.0, 8]}' \
65
+ | python -c "import sys,json;print(json.load(sys.stdin)['event_id'])")
66
+ curl -sN "$SPACE/gradio_api/call/generate/$EVT" -H "Authorization: Bearer $TOKEN"
67
+ ```
68
+
69
+ ## Logs (SSE)
70
+
71
+ ```bash
72
+ curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/build"
73
+ curl -N -H "Authorization: Bearer $TOKEN" "https://huggingface.co/api/spaces/WeReCooking/ltx-2.3-cpu/logs/run"
74
+ ```
75
+
76
+ ## Why not ship inference on free CPU anyway
77
+
78
+ I attempted the GGUF path locally. Findings:
79
+ - 10.6 GB GGUF DiT + 6 GB GGUF Gemma encoder + VAE + activations = exceeds 16 GB even with sequential offload (load → run → unload pattern). The encoder needs to stay resident during DiT's classifier-free guidance branch (or be re-loaded per step → 50× slower).
80
+ - 2 vCPU × 22B params at Q3_K_M ≈ ~120 sec/diffusion step → 8-step distilled = ~16 min just for the DiT loop, plus encode + VAE decode + offload swaps → realistically 60-90 min for a 2-sec, 384×256 clip. HF Space request timeout is 1 hour. The math doesn't close.
81
+
82
+ The honest path on free CPU is **not to ship a broken Generate button** — instead, ship the recipe and demos.
app.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LTX 2.3 CPU Space — feasibility reference + ZeroGPU upgrade recipe.
2
+
3
+ This Space documents why LTX 2.3 (22B) on free HF CPU is impractical and
4
+ shows the upgrade path. Generation is disabled on CPU; the UI mirrors what
5
+ a ZeroGPU fork would look like so users can clone and switch hardware in
6
+ one click.
7
+ """
8
+
9
+ from pathlib import Path
10
+
11
+ import gradio as gr
12
+
13
+ FEASIBILITY_TABLE = """\
14
+ | Hardware | Per 2-sec clip | Notes |
15
+ |-----------------------|----------------|-----------------------------------|
16
+ | Free CPU (this Space) | not feasible | 22B at Q3_K_M does not fit in 16 GB |
17
+ | CPU Upgrade 32 GB | 30-60 min | marginal, $0.30/clip |
18
+ | ZeroGPU (Pro) | 25-40 sec | recommended path |
19
+ | GPU L40S 48 GB | ~8 sec | dedicated |
20
+ """
21
+
22
+ PIPELINE_NOTE = """\
23
+ **Path A** (used here): Unsloth `distilled-1.1` GGUF Q3_K_M, 10.6 GB DiT + Gemma-3-12B-it Q3_K_M 6 GB encoder. ComfyUI-GGUF loader.
24
+ **Path C** (research): 10Eros fine-tune + cond_safe distill LoRA — fine-tune, not distilled. Larger LoRAs harm 10Eros fine-tune; needs tuning.
25
+ **Text encoder cannot be swapped** — diffusion U-Net is bound to `google/gemma-3-12b-it`. Only quantisation, not replacement, is valid.
26
+ """
27
+
28
+
29
+ def cpu_generate_stub(prompt: str, duration_sec: float, steps: int) -> str:
30
+ return (
31
+ "CPU inference is disabled on this free Space — 22B + 16 GB RAM is\n"
32
+ "infeasible. Fork to ZeroGPU (see README) to enable generation.\n\n"
33
+ f"Prompt received: {prompt[:100]}\n"
34
+ f"Duration: {duration_sec:.1f} s\n"
35
+ f"Steps: {steps}"
36
+ )
37
+
38
+
39
+ def health() -> str:
40
+ return "ok — LTX 2.3 CPU Space (documentation mode)"
41
+
42
+
43
+ DEMO_VIDEOS = sorted(str(p) for p in Path("/app/assets/videos").glob("*.mp4"))
44
+
45
+ with gr.Blocks(title="LTX 2.3 CPU") as demo:
46
+ gr.Markdown("**LTX 2.3 CPU** — feasibility reference + ZeroGPU recipe. 22B video diffusion does not run on free CPU; this is a fork-and-upgrade template.")
47
+ with gr.Row(equal_height=True):
48
+ with gr.Column(scale=1):
49
+ prompt_in = gr.Textbox(label="Prompt", placeholder="A woman walking through a neon-lit Tokyo alley at night, cinematic", lines=3)
50
+ with gr.Row():
51
+ duration_in = gr.Slider(1.0, 4.0, value=2.0, step=0.5, label="Duration (s)")
52
+ steps_in = gr.Slider(4, 16, value=8, step=1, label="Steps (distilled)")
53
+ run_btn = gr.Button("Generate (disabled on CPU — fork to ZeroGPU)", variant="primary")
54
+ status = gr.Textbox(label="Status", lines=5, interactive=False, show_copy_button=True)
55
+ with gr.Column(scale=1):
56
+ gr.Markdown("### Feasibility")
57
+ gr.Markdown(FEASIBILITY_TABLE)
58
+ gr.Markdown("### Pipeline")
59
+ gr.Markdown(PIPELINE_NOTE)
60
+ if DEMO_VIDEOS:
61
+ gr.Examples(
62
+ examples=[[v] for v in DEMO_VIDEOS],
63
+ inputs=[gr.Video(visible=False)],
64
+ examples_per_page=6,
65
+ cache_examples=False,
66
+ label="Reference outputs (pre-generated on GPU)",
67
+ )
68
+ run_btn.click(fn=cpu_generate_stub, inputs=[prompt_in, duration_in, steps_in], outputs=[status], api_name="generate")
69
+ gr.Button(visible=False).click(fn=health, outputs=[gr.Textbox(visible=False)], api_name="health")
70
+
71
+ demo.queue(default_concurrency_limit=1)
72
+ demo.launch(server_name="0.0.0.0", server_port=7860, theme=gr.themes.Soft())