VGGT LaCT (stage 1) — slim adapter weights

These files are LaCT-block weights only (~200 MB), not a full VGGT checkpoint. They plug into the public facebook/VGGT-1B backbone: DINOv2 patch embed, frame-wise attention, and prediction heads stay at Meta’s pretrained VGGT-1B; only the global-attention layers are replaced by LaCT-style fast-weight GLU blocks trained with stage-1 distillation against the frozen teacher.

Code: github.com/Akrao9/vggt_ttt (install vggt from facebookresearch/vggt as in that README).

Files

File Description
vggt_ttt_lact_stage1.pt Stage 1 distilled LaCT state dict (lact_state_dict() format). Keys are prefixed with aggregator.lact_blocks..

Load (Python)

import torch
from huggingface_hub import hf_hub_download

# From the vggt_ttt repo (with `vggt` installed per upstream README):
from model.vggt_ttt import VGGT_TTT
from model.io_utils import torch_load_checkpoint

ckpt_path = hf_hub_download("akrao9/VGGT-LACT", "vggt_ttt_lact_stage1.pt")
device = "cuda"
model = VGGT_TTT.from_pretrained("facebook/VGGT-1B", chunk_size=16).to(device).eval()
state = torch_load_checkpoint(ckpt_path, map_location=device)
model.load_lact_state_dict(state, strict=True)

Use a local path instead of hf_hub_download if you already downloaded the .pt file.

Inference CLI

From the vggt_ttt repo, after downloading this checkpoint locally:

python scripts/run_inference.py \
  --input path/to/video.mp4 --fps 2 \
  --checkpoint ./vggt_ttt_lact_stage1.pt \
  --out ./out

(--checkpoint accepts this LaCT-only dict; see scripts/run_inference.py.)

Training summary

  • Stage 1: distillation from frozen facebook/VGGT-1B (pose / depth / world points), trainable parameters confined to the 24 LaCT blocks; c_proj zero-init for a near-identity start.
  • Checkpoints: saved with torch.save(model.lact_state_dict(), path) — same tensor layout as this Hub file.

Hardware / scaling

LaCT path is aimed at longer frame sequences with more favorable VRAM scaling than full global attention; see the GitHub README for benchmark tables (DL3DV-style eval).

License and attribution

  • This adapter repository and the training code release are under Apache 2.0 (see project LICENSE / NOTICE on GitHub).
  • VGGT-1B is subject to Meta’s license and terms on its model card; you must comply with those when using the backbone.
  • Method builds on VGGT and LaCT-style components as described in the upstream README.

Citation

If you use these weights or the vggt_ttt codebase, cite the original VGGT paper/repo and credit this adapter as appropriate for your venue.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akrao9/VGGT-LACT

Base model

facebook/VGGT-1B
Finetuned
(6)
this model