YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Track4D-360 camera-control checkpoint backup β 2026-04-23 / extended 2026-04-28
Tagged copies of every benchmarked Track4D-360 camera-control checkpoint.
The 1.3B family (4 files, ~2.4 GB each) was the original
2026-04-23 backup for the CLIP-F/V benchmark
(doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md); the 14B
savefix step-2000 file was added 2026-04-28 once the multi-node
FSDP-savefix run (doc/track4d-360/bugs/2026-04-25-multinode-training-desync-fixes.md)
produced its first verified-correct checkpoint.
Filenames are renamed to carry the variant tag so the source of truth is legible without walking back through the training dirs.
Upload destination: all files in this directory get pushed to
yslan/track4d_360
on HuggingFace, preserving the exact filename. Filename = HF file path.
| backup file | size | source | training @ |
|---|---|---|---|
warped_step-13000.safetensors |
2.2 GB | warped_appearance_concat_proj_mixed_real_synth_144x256x49_1p3b_2gpu/train/Wan2.1-T2V-1.3B_track4d360_warped_appearance_concat_proj_mixed_real_synth/step-13000.safetensors |
144x256 (Lyra-2 latent-fuse new architecture) |
static13k_step-13500.safetensors |
2.4 GB | hybrid_dense_plucker_mixed_real_synth_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-13500.safetensors |
144x256 (old-arch, static-only, no syn4d) |
dynamic5k_step-5000.safetensors |
2.4 GB | hybrid_dense_plucker_mixed_real_synth_syn4d_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-5000.safetensors |
144x256 (old-arch, + syn4d dynamic) |
ismb288_3k_step-3000.safetensors |
2.4 GB | ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_cond_dropout_288x512x49_1p3b_16gpu/train/.../step-3000.safetensors |
288x512 (native) (old-arch, + syn4d + RecamMaster + SynCamMaster, Isambard-trained, synced 2026-04-23) |
14b_savefix_step-2000.safetensors |
22.9 GB | ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_savefix_cond_dropout_144x256x49_14b_16gpu_fsdp/train/Wan2.1-Fun-14B_track4d360_..._savefix_cond_dropout/step-2000.safetensors |
144x256 14B (Wan-Fun 14B FSDP, post-savefix bug fix, 4-node Isambard, copied 2026-04-28). Train launcher: bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh |
Size delta within the 1.3B family: warped is 2.2 GB vs 2.4 GB for the
others. This is the architectural difference β warped has
dense_in_channels=2 (geometry-only dense) vs 5 (RGB+geom) for the
old-arch models, and a different track_injection_mode with different
trainable subgraphs.
The 14B file's 22.9 GB is bf16 weights for the full 14B Wan-Fun DiT plus
the trainable Track4D-360 adapter modules (track_adapter, track_block_injector,
dense_target_control_encoder, plucker control_adapter) β see the
[Eval] DiT checkpoint load line printed by the eval script for the
exact key inventory.
Backup method: cp (1.3B family was rsync 2026-04-23; 14B was a single
cp 2026-04-28). Verified by byte-size match against sources.
Reproducibility bundles (dataset subsets, for sharing)
Some benchmarks need a tiny slice of the full dataset roots. Those bundles live alongside the checkpoints so the whole "checkpoints + data + scripts" tree can be tarred together for a collaborator.
| bundle dir | size | bench it reproduces |
|---|---|---|
bench_ismb288_multiframe_repro/ |
~52 GB | benchmark_ismb288_3k_multiframe_vs_zbuffer_288x512.sh (5 datasets Γ 5 scenes Γ 10 trajectories @ 288Γ512). See bench_ismb288_multiframe_repro/README.md for layout, run instructions, and what's deliberately NOT included (Wan base + VXF source). Generated by prepare_repro_data_ismb288_multiframe.sh. |
To create an archive for upload (data + scripts only β checkpoint is already
on HF as yslan/track4d_360/ismb288_3k_step-3000.safetensors, no need to
re-bundle it). The bundle is mostly PNG/EXR/safetensors β already compressed
content, so gzip is slow and gains almost nothing. Recommended: plain .tar.
cd /scratch/shared/beegfs/yushi/logs/track4d-360/backup
# Recommended β plain tar, fast (just streams bytes; PNG/EXR don't compress):
tar -cf bench_ismb288_multiframe_repro.tar bench_ismb288_multiframe_repro
# Alternative if you prefer .tar.gz format β use parallel gzip:
# tar -c bench_ismb288_multiframe_repro | pigz -p $(nproc) > bench_ismb288_multiframe_repro.tar.gz
# Alternative β tar + zstd (best size/speed for HF if both sides have zstd):
# tar --use-compress-program='zstd -T0 -3' -cf bench_ismb288_multiframe_repro.tar.zst bench_ismb288_multiframe_repro
# Avoid: tar -czf ... β single-threaded gzip on 52 GB, ~hours, near-zero gain.
# After verifying the archive is good (and ideally after uploading to HF),
# the unzipped tree is redundant β drop it to reclaim 52 GB:
rm -rf bench_ismb288_multiframe_repro
# Re-creating later is cheap (rsync -a, ~52 GB read from beegfs sources):
bash /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh
How to eval
All 4 checkpoints evaluate through the SAME master benchmark script
(under VideoX-Fun/ repo root). It takes care of per-variant arg dispatch
so you don't have to think about the flags listed below.
Master benchmark (all 4 models, 5 datasets Γ 5 scenes Γ 10 trajectories @ 288x512x49)
cd /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun
bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
Options:
# one variant only:
MODEL=warped bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
MODEL=static13k bash ...
MODEL=dynamic5k bash ...
MODEL=ismb288_3k bash ...
# custom GPU pair (defaults GPU0=0 GPU1=1):
GPU0=4 GPU1=5 bash ...
# point at backup ckpts instead of the live train dirs (example override):
CKPT_WARPED=/scratch/shared/beegfs/yushi/logs/track4d-360/backup/warped_step-13000.safetensors \
bash ...
The script writes to
/scratch/shared/beegfs/yushi/logs/track4d-360/benchmark/clipfv_4models_288x512_2gpu/,
with summary.md aggregated by
python -m track4d_360.tools.aggregate_clip_benchmark.
Already-completed trajectories auto-skip on re-run (pred_rgb.mp4 existence
check in each novel-traj script).
Single-dataset / ad-hoc invocations
If you just want to run one checkpoint against one dataset, the benchmark script
dispatches to these three eval entrypoints (all under examples/wan2.1_fun/):
| dataset | script |
|---|---|
| mvs_synth, dl3dv, re10k | eval_track4d360_hybrid_dense_static_scene_novel_traj.py |
| kubric | eval_track4d360_hybrid_dense_kubric_novel_traj.py |
| syn4d | eval_track4d360_hybrid_dense_syn4d_novel_traj.py |
All three use track4d_360.shared_args as of 2026-04-23 β so they accept the
full warped + plucker + dense + track flag set. Exact per-variant flags are
the table below β do not forget them: build_eval_pipeline reads
warped_condition_mode via getattr(..., "off"), so an omitted flag on a
warped checkpoint silently runs the model in the wrong architecture.
Per-variant eval-flag recipe
Must match training β silent mismatches are the #1 source of wrong results.
See doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md Β§2 bug log.
warped_step-13000.safetensors:
--track_injection_mode single
--warped_condition_mode latent_fuse
--warped_appearance_fusion concat_proj
--warped_geom_only_dense
--dense_in_channels 2
static13k_step-13500.safetensors
dynamic5k_step-5000.safetensors
ismb288_3k_step-3000.safetensors:
--track_injection_mode per_block
--track_injection_block_mode concat_project
--warped_condition_mode off
--dense_in_channels 5
Shared across all 4:
--use_plucker_camera_control
--enable_v2v_plucker_camera_control
--use_query_frame_impulse_condition
--use_dense_branch
--dense_proj_dim 32
--dense_num_residual_blocks 2
--dense_alpha_track 1.0
--track_config config/track4d_360/default_conv3d_patchify_srcdepth.yaml
--num_inference_steps 50
--cfg_scale 1.0
--sigma_shift 5.0
--seed 42
And the base-DiT init path is always weights/wan21-1p3b/diffusion_pytorch_model.safetensors
via --vxf_init_checkpoint (CLAUDE.md load-order Invariant A/B β VXF init must
run BEFORE LoRA wrap and is required on both scratch and resume paths).