Request access — commercial use requires a BRIA AI agreement; academic access granted upon request

Use of this model requires a commercial agreement with BRIA AI. Academic access will be granted upon request — please fill in this form and indicate your academic affiliation, and the BRIA team will follow up to grant access.

BRIA Video Background Removal v3.0 (VRMBG-3.0)

VRMBG-3.0 improves both temporal consistency and per-frame accuracy over VRMBG-2.0 while maintaining a lightweight design that enables real-time video background removal. The model achieves an attractive trade-off between efficiency and state-of-the-art performance — both in matte quality and in temporal stability — and was carefully trained on a proprietary video dataset spanning a diverse range of settings, subjects, and scene conditions.

For still-image background removal, see RMBG-2.0.

Model Details

Developed by: BRIA AI
Model type: Video background removal / alpha matting
Parameters: ~220M
Inference resolution: 1024 × 1024
Input: Current RGB video frame, paired with the previous frame's RGB multiplied by the previous frame's predicted alpha matte
Output: Single-channel alpha matte for the current frame, in the range [0, 1]
Latency: Real-time inference
License: BRIA VRMBG-3.0 License — non-commercial use only. Commercial use requires a commercial agreement with BRIA AI.

How it works

VRMBG-3.0 is autoregressive along the time axis. At each step the model consumes the current RGB frame together with the previous frame's RGB masked by the previous frame's predicted alpha, and emits the alpha matte for the current frame:

α_t = VRMBG3(RGB_t, RGB_{t-1} · α_{t-1})

For the first frame of a clip (no temporal prior), zero tensors are passed for both the previous-frame RGB and the previous-frame alpha. Conditioning on the previous frame's masked foreground provides a strong temporal prior that stabilises matte boundaries across frames and substantially reduces flicker compared with per-frame inference.

Inference

Minimal example

import torch
import numpy as np
import cv2
from torchvision import transforms
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from model import build_vrmbg3

# 1. Load the model
model = build_vrmbg3()
weights = hf_hub_download(repo_id="briaai/VRMBG-3.0", filename="model.safetensors")
model.load_state_dict(load_file(weights))
model = model.eval().half().cuda()

# 2. Pre-processing.
INFER_SIZE = 1024
normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
to_tensor = transforms.ToTensor()
device = torch.device("cuda")
dtype  = next(model.parameters()).dtype  # likely torch.float16

# 3. Initialise temporal state with zeros for the first frame.
prev_rgb_t = torch.zeros(3, INFER_SIZE, INFER_SIZE, device=device, dtype=dtype)
prev_alpha = torch.zeros(1, INFER_SIZE, INFER_SIZE, device=device, dtype=dtype)

cap = cv2.VideoCapture("input.mp4")
mattes = []

while True:
    ok, bgr = cap.read()
    if not ok:
        break
    rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
    h, w = rgb.shape[:2]
    rgb_resized = cv2.resize(rgb, (INFER_SIZE, INFER_SIZE), interpolation=cv2.INTER_LINEAR)
    current_t = normalize(to_tensor(rgb_resized)).to(device=device, dtype=dtype)

    # Build the paired input: [current RGB, previous RGB * previous alpha].
    paired = torch.cat([current_t, prev_rgb_t * prev_alpha], dim=0).unsqueeze(0)
    paired = paired.contiguous(memory_format=torch.channels_last)

    with torch.no_grad():
        pred = model(paired)[-1].sigmoid().squeeze(0)  # (1, H, W) in [0, 1]

    # Resize the matte back to native resolution.
    alpha_native = cv2.resize(
        pred[0].float().cpu().numpy(), (w, h), interpolation=cv2.INTER_LINEAR
    )
    mattes.append(alpha_native)

    # Update temporal state for the next frame.
    prev_rgb_t = current_t
    prev_alpha = pred

cap.release()

Intended Use

Real-time video background removal for production content (people, objects, products) where temporal stability matters.
Autoregressive inference along the time axis: the model consumes the current frame together with the previous frame's predicted alpha at each step.
For still-image background removal, use RMBG-2.0.

Files

File	Description
`model.py`	Model architecture — import this to instantiate the network
`model.safetensors`	Trained weights in safetensors format, 885 MB
`pytorch_model.bin`	Same weights as a PyTorch `state_dict`
`README.md`	This model card

License

Released under the BRIA VRMBG-3.0 License. This model is not open source at the moment. Commercial use is subject to a commercial agreement with BRIA AI — please contact the BRIA team to request access or arrange a commercial agreement.

Downloads last month: -; Downloads are not tracked for this model. How to track