RoomAudit LoRA Adapters

QLoRA adapters for hotel room cleanliness detection, fine-tuned on Qwen3-VL-4B-Instruct. Part of the roomaudit project.

Three adapters are included here, each from a different training approach. All were trained on the same synthetic dataset: 218 clean hotel room images with defects painted in using SAM3 + FLUX.1 Fill inpainting.

Adapters

`lora_adapter` — primary adapter, use this one

Single-turn format. Takes a room image, returns a JSON verdict with clean/messy classification and a defect list.

Metric	Score
Accuracy	0.714
Precision	0.676
Recall	0.906
F1	0.774

`lora_adapter_agent` — agentic (two-turn) adapter

Two-turn format: Round 1 selects 1-2 regions to inspect, Round 2 gives the final verdict after seeing the crops. Scores below the single-turn adapter on the current synthetic dataset. Included as a reference for the agentic training approach.

Metric	Score
Accuracy	0.663
Precision	0.622
Recall	0.902
F1	0.736

`lora_adapter_vit` — ViT + LLM adapter

Same single-turn format as the primary adapter, but with LoRA applied to the vision encoder as well as the language layers. Worse than LLM-only training: the ViT adapters learn to detect FLUX inpainting artefacts rather than actual room defects. Included as a reference.

Metric	Score
Accuracy	0.587
Precision	0.568
Recall	0.991
F1	0.722

Quickstart

from huggingface_hub import snapshot_download
from unsloth import FastVisionModel
from peft import PeftModel
from PIL import Image
import json, re
from qwen_vl_utils import process_vision_info

snapshot_download(
    "RanenSim/RoomAudit-Lora",
    allow_patterns="lora_adapter/*",
    local_dir="outputs/",
)

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, "outputs/lora_adapter")
FastVisionModel.for_inference(model)

image = Image.open("room.jpg").convert("RGB")
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a hotel room cleanliness inspector. Respond ONLY with valid JSON."}]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": '{"clean": true/false, "defects": [{"object": "...", "type": "...", "description": "..."}]}'},
    ]},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = tokenizer(text=[text], images=image_inputs, padding=True, return_tensors="pt").to("cuda")
out_ids = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.1)
output = tokenizer.decode(out_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
result = json.loads(re.search(r"\{.*\}", output, re.DOTALL).group())

See each adapter's README for full usage instructions, training config, and results.

Source code, training notebooks, and data generation pipeline: github.com/Razorbird360/roomaudit

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

RoomAudit LoRA Adapters

Adapters

lora_adapter — primary adapter, use this one

lora_adapter_agent — agentic (two-turn) adapter

lora_adapter_vit — ViT + LLM adapter

Quickstart

`lora_adapter` — primary adapter, use this one

`lora_adapter_agent` — agentic (two-turn) adapter

`lora_adapter_vit` — ViT + LLM adapter