Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed
This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B Step-3.5-Flash "Brain" into the 8B MolmoWeb "Eyes" and applying TurboQuant extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning.
Model Details
Model Description
- Developed by: [@macmacmacmac ]
- Model type: Vision-Language Model (VLM) / Agentic Specialist
- Language(s) (NLP): English (Optimized for account registration, logins, mfa)
- License: Apache-2.0
- Finetuned from model: allenai/MolmoWeb-8B
- Teacher Model: stepfun-ai/Step-3.5-Flash (196B Sparse MoE)
Model Sources
- Repository: [https://huggingface.co/macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed]
- Trajectory Data: allenai/MolmoWeb-HumanTrajs (Identity Subset)
- Quantization Tech: TurboQuant (March 2026 Release)
Uses
Direct Use
Specifically intended for Identity Steering:
- Creating new accounts (Sign-up flows).
- Managing existing credentials (Login/MFA handling).
- Password recovery and security settings adjustment.
- Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts).
Out-of-Scope Use
- General purpose chatbot tasks (Poetry, coding, general trivia).
- High-stakes financial transfers without human-in-the-loop.
- Medical diagnosis or legal advice.
- Non-web based automation (OS-level file management).
Bias, Risks, and Limitations
- Coordinate Drift: Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays.
- Hallucination: While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories.
- Privacy: While the model runs locally, the content of the screen is processed. Users must ensure the environment is secure.
Recommendations
Users should deploy this model with the Web UI Overlay (link soon) to ensure the agent's internal reasoning (<|thought|>) is transparent to the user, reducing anxiety during automated actions.
How to Get Started with the Model
import turboquant as tq
from transformers import AutoModelForImageTextToText
# Optimized for 3GB VRAM deployment
model = AutoModelForImageTextToText.from_pretrained(
"YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed",
device_map="auto",
trust_remote_code=True
)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support