Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed

This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B Step-3.5-Flash "Brain" into the 8B MolmoWeb "Eyes" and applying TurboQuant extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning.

Model Details

Model Description

  • Developed by: [@macmacmacmac ]
  • Model type: Vision-Language Model (VLM) / Agentic Specialist
  • Language(s) (NLP): English (Optimized for account registration, logins, mfa)
  • License: Apache-2.0
  • Finetuned from model: allenai/MolmoWeb-8B
  • Teacher Model: stepfun-ai/Step-3.5-Flash (196B Sparse MoE)

Model Sources

Uses

Direct Use

Specifically intended for Identity Steering:

  • Creating new accounts (Sign-up flows).
  • Managing existing credentials (Login/MFA handling).
  • Password recovery and security settings adjustment.
  • Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts).

Out-of-Scope Use

  • General purpose chatbot tasks (Poetry, coding, general trivia).
  • High-stakes financial transfers without human-in-the-loop.
  • Medical diagnosis or legal advice.
  • Non-web based automation (OS-level file management).

Bias, Risks, and Limitations

  • Coordinate Drift: Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays.
  • Hallucination: While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories.
  • Privacy: While the model runs locally, the content of the screen is processed. Users must ensure the environment is secure.

Recommendations

Users should deploy this model with the Web UI Overlay (link soon) to ensure the agent's internal reasoning (<|thought|>) is transparent to the user, reducing anxiety during automated actions.

How to Get Started with the Model

import turboquant as tq
from transformers import AutoModelForImageTextToText

# Optimized for 3GB VRAM deployment
model = AutoModelForImageTextToText.from_pretrained(
    "YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed",
    device_map="auto",
    trust_remote_code=True
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed

Finetuned
Qwen/Qwen3-8B
Finetuned
(1)
this model

Dataset used to train macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed