How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve")
model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links
GenEvolve

GenEvolve

Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Paper Project Page Code Dataset

This repository hosts the GenEvolve agent policy — a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable prompt-reference program z = (gen_prompt, reference_images) that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).

GenEvolve teaser

The same trained agent policy paired with two reference-conditioned generators ⟶
Qwen-Image-Edit (open)  ·  Nano Banana Pro (strong)


✨ Highlights

  • Tool-orchestrated trajectories. The agent calls search, image_search, and query_knowledge (8 callable generation skills) before producing a final program z = (gen_prompt, reference_images).
  • Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
  • Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).

📊 Headline Results

GenEvolve-Bench (KScore, held-out split)

Method Generator KScore Knowledge-Anch. Quality-Anch.
Qwen-Image (raw) Qwen-Image 0.2987 0.2384 0.3768
Nano Banana Pro (raw) Nano Banana Pro 0.5298 0.5160 0.5477
Gen-Searcher 8B Qwen-Image-Edit-2511 0.3493 0.3293 0.3745
Gen-Searcher 8B Nano Banana Pro 0.5481 0.5472 0.5492
GenEvolve (Ours) Qwen-Image-Edit-2511 0.3663 0.3410 0.3990
GenEvolve (Ours) Nano Banana Pro 0.5739 0.5669 0.5830

WISE Benchmark (WiScore, six knowledge categories)

Model Cultural Time Space Biology Physics Chemistry Overall
GPT-4o 0.81 0.71 0.89 0.83 0.79 0.74 0.80
Gen-Searcher-8B + Qwen-Image 0.80 0.71 0.82 0.76 0.74 0.75 0.77
Mind-Brush 0.83 0.69 0.84 0.71 0.85 0.68 0.78
GenEvolve + Qwen-Image-Edit 0.84 0.74 0.87 0.83 0.81 0.83 0.82

🧠 Method Overview

GenEvolve method overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.


🖼️ Visual Demos

Qualitative comparison

Qualitative comparison on representative cases. Orange marks external/uncommon knowledge requirements; blue marks internal generation-knowledge requirements.

🎨 Gallery — paired with Nano Banana Pro

GenEvolve + Nano Banana Pro gallery

The same agent policy with Nano Banana Pro as the downstream renderer. Examples cover spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.

🎨 Gallery — paired with Qwen-Image-Edit (open)

GenEvolve + Qwen-Image-Edit gallery

Same trained policy paired with the open-source Qwen-Image-Edit-2511 renderer; consistent quality across both generators reflects generator-transferable orchestration.


🚀 Quick Start

The deployed checkpoint is the student policy — it consumes a user prompt and returns a JSON gen_prompt + reference_images program through a <think>/<tool_call>/<answer> loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the GitHub repo; the snippet below mirrors its installation and usage.

1. Install the main GenEvolve runtime

git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve

conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .

Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use --backend qwen-image-edit-service.

2. Serve the agent policy

# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh

# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh

TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP × DP.

3. End-to-end example

export SERPER_API_KEY=<your_key>      # required for search / image_search
export GOOGLE_API_KEY=<your_key>      # or GEMINI_API_KEY; only for --backend nano-banana-pro

# Nano Banana Pro renderer
python examples/quickstart.py \
    --backend nano-banana-pro \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
    --output paris.png

# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service:8001 \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --output paris_qwen.png

The agent's final <answer> is a JSON object:

{
  "gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
  "reference_images": [
    {"img_id": "IMG_001", "note": "what to copy from this image"}
  ]
}

gen_prompt MUST refer to selected images using ordinal phrases ("the first reference image") — never raw IMG_### ids or URLs. Pass (gen_prompt, [r["local_path"] for r in reference_images]) to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.


🗂️ Related Artifacts

Artifact Link
Project page https://ephemeral182.github.io/GenEvolve/
Paper Coming soon
Code https://github.com/MeiGen-AI/GenEvolve
Training data + benchmark MeiGen-AI/GenEvolve-Data-Bench
Base model Qwen/Qwen3-VL-8B-Instruct

⚖️ Intended Use, Limits, Bias

  • Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
  • Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
  • Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.

📑 Citation

@misc{chen2026genevolveselfevolvingimagegeneration,
      title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, 
      author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
      year={2026},
      eprint={2605.21605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.21605}, 
}
Downloads last month
22
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MeiGen-AI/GenEvolve

Finetuned
(276)
this model

Dataset used to train MeiGen-AI/GenEvolve

Paper for MeiGen-AI/GenEvolve