Request access — commercial use requires a BRIA AI agreement; academic access granted upon request
Use of this model requires a commercial agreement with BRIA AI. Academic access will be granted upon request — please fill in this form and indicate your academic affiliation, and the BRIA team will follow up to grant access.
Log in or Sign Up to review the conditions and access this model content.
BRIA Video Background Removal v3.0 (VRMBG-3.0)
VRMBG-3.0 improves both temporal consistency and per-frame accuracy over VRMBG-2.0 while maintaining a lightweight design that enables real-time video background removal. The model achieves an attractive trade-off between efficiency and state-of-the-art performance — both in matte quality and in temporal stability — and was carefully trained on a proprietary video dataset spanning a diverse range of settings, subjects, and scene conditions.
For still-image background removal, see RMBG-2.0.
Model Details
- Developed by: BRIA AI
- Model type: Video background removal / alpha matting
- Parameters: ~220M
- Inference resolution: 1024 × 1024
- Input: Current RGB video frame, paired with the previous frame's RGB multiplied by the previous frame's predicted alpha matte
- Output: Single-channel alpha matte for the current frame, in the range
[0, 1] - Latency: Real-time inference
- License: BRIA VRMBG-3.0 License — non-commercial use only. Commercial use requires a commercial agreement with BRIA AI.
How it works
VRMBG-3.0 is autoregressive along the time axis. At each step the model consumes the current RGB frame together with the previous frame's RGB masked by the previous frame's predicted alpha, and emits the alpha matte for the current frame:
α_t = VRMBG3(RGB_t, RGB_{t-1} · α_{t-1})
For the first frame of a clip (no temporal prior), zero tensors are passed for both the previous-frame RGB and the previous-frame alpha. Conditioning on the previous frame's masked foreground provides a strong temporal prior that stabilises matte boundaries across frames and substantially reduces flicker compared with per-frame inference.
Inference
Minimal example
import torch
import numpy as np
import cv2
from torchvision import transforms
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from model import build_vrmbg3
# 1. Load the model
model = build_vrmbg3()
weights = hf_hub_download(repo_id="briaai/VRMBG-3.0", filename="model.safetensors")
model.load_state_dict(load_file(weights))
model = model.eval().half().cuda()
# 2. Pre-processing.
INFER_SIZE = 1024
normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
to_tensor = transforms.ToTensor()
device = torch.device("cuda")
dtype = next(model.parameters()).dtype # likely torch.float16
# 3. Initialise temporal state with zeros for the first frame.
prev_rgb_t = torch.zeros(3, INFER_SIZE, INFER_SIZE, device=device, dtype=dtype)
prev_alpha = torch.zeros(1, INFER_SIZE, INFER_SIZE, device=device, dtype=dtype)
cap = cv2.VideoCapture("input.mp4")
mattes = []
while True:
ok, bgr = cap.read()
if not ok:
break
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
h, w = rgb.shape[:2]
rgb_resized = cv2.resize(rgb, (INFER_SIZE, INFER_SIZE), interpolation=cv2.INTER_LINEAR)
current_t = normalize(to_tensor(rgb_resized)).to(device=device, dtype=dtype)
# Build the paired input: [current RGB, previous RGB * previous alpha].
paired = torch.cat([current_t, prev_rgb_t * prev_alpha], dim=0).unsqueeze(0)
paired = paired.contiguous(memory_format=torch.channels_last)
with torch.no_grad():
pred = model(paired)[-1].sigmoid().squeeze(0) # (1, H, W) in [0, 1]
# Resize the matte back to native resolution.
alpha_native = cv2.resize(
pred[0].float().cpu().numpy(), (w, h), interpolation=cv2.INTER_LINEAR
)
mattes.append(alpha_native)
# Update temporal state for the next frame.
prev_rgb_t = current_t
prev_alpha = pred
cap.release()
Intended Use
- Real-time video background removal for production content (people, objects, products) where temporal stability matters.
- Autoregressive inference along the time axis: the model consumes the current frame together with the previous frame's predicted alpha at each step.
- For still-image background removal, use RMBG-2.0.
Files
| File | Description |
|---|---|
model.py |
Model architecture — import this to instantiate the network |
model.safetensors |
Trained weights in safetensors format, 885 MB |
pytorch_model.bin |
Same weights as a PyTorch state_dict |
README.md |
This model card |
License
Released under the BRIA VRMBG-3.0 License. This model is not open source at the moment. Commercial use is subject to a commercial agreement with BRIA AI — please contact the BRIA team to request access or arrange a commercial agreement.