| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - InternSVG/SAgoge |
| | base_model: |
| | - OpenGVLab/InternVL3-8B |
| | --- |
| | <div align="center"> |
| | <h1> InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models </h1> |
| |
|
| | <div align="center"> |
| | <a href='https://arxiv.org/abs/2510.11341'><img src='https://img.shields.io/badge/arXiv-2510.11341-b31b1b?logo=arXiv'></a> |
| | <a href='https://hmwang2002.github.io/release/internsvg/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> |
| | <a href="https://huggingface.co/datasets/InternSVG/SArena"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Benchmark%20-HF-orange"></a> |
| | <a href="https://huggingface.co/datasets/InternSVG/SAgoge"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset%20-HF-orange"></a> |
| | <a href="https://huggingface.co/InternSVG/InternSVG-8B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model%20-HF-orange"></a> |
| | </div> |
| | </div> |
| |
|
| | ## **🤖 InternSVG Model** |
| |
|
| | The **InternSVG-8B** model is available at [Hugging Face](https://huggingface.co/InternSVG/InternSVG-8B). It is based on the InternVL3-8B model, incorporating SVG-specific tokens, and undergoes Supervised Fine-Tuning (SFT) under a two-stage training strategy using the massive SVG training samples from the SAgoge dataset. |
| |
|
| | ### Deploy |
| |
|
| | We recommend using [LMDeploy](https://github.com/InternLM/lmdeploy) for deployment. An example of launching a proxy server with 8 parallel workers (one per GPU) is provided below: |
| |
|
| | ```bash |
| | #!/bin/bash |
| | model_path="MODEL_PATH" |
| | model_name="InternSVG" |
| | |
| | # proxy |
| | lmdeploy serve proxy --server-name 0.0.0.0 --server-port 10010 --routing-strategy "min_expected_latency" & |
| | |
| | worker_num=8 |
| | for ((i = 0; i < worker_num; i++)); do |
| | timestamp=$(date +"%Y-%m-%d_%H-%M-%S") |
| | CUDA_VISIBLE_DEVICES="${i}" lmdeploy serve api_server ${model_path} --proxy-url http://0.0.0.0:10010 \ |
| | --model-name ${model_name} \ |
| | --tp 1 \ |
| | --max-batch-size 512 \ |
| | --backend pytorch \ |
| | --server-port $((10000 + i)) \ |
| | --session-len 16384 \ |
| | --chat-template "internvl2_5" \ |
| | --log-level WARNING &>> ./logs/api_${model_name}_${timestamp}_${i}.out & |
| | sleep 10s |
| | done |
| | ``` |
| |
|
| | ### Train |
| |
|
| | If you need to train your own model, please follow these steps: |
| |
|
| | 1. **Prepare the Dataset:** Download the **SAgoge** dataset. After that, update the paths for the SAgoge-related subdatasets in `LLaMA-Factory/data/dataset_info.json` to match your local file paths. |
| | 2. **Download InternVL3-8B:** Download the InternVL3-8B from [link](https://huggingface.co/OpenGVLab/InternVL3-8B-hf). |
| | 3. **Add Special Tokens:** Before training, you must add SVG-specific tokens to the base model. Run the `utils/add_token.py` script, which adds these special tokens to the original model weights and initializes their embeddings based on subwords. |
| | 4. **Start Training:** We provide example configuration scripts for the two-stage training process. You can find them at: |
| | - **Stage 1:** `LLaMA-Factory/examples/train_full/stage_1.yaml` |
| | - **Stage 2:** `LLaMA-Factory/examples/train_full/stage_2.yaml` |
| |
|
| | Then use `llamafactory-cli train` to start training. |
| | |
| | ## 📖 Citation |
| |
|
| | ```BibTex |
| | @article{wang2025internsvg, |
| | title={InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models}, |
| | author={Wang, Haomin and Yin, Jinhui and Wei, Qi and Zeng, Wenguang and Gu, Lixin and Ye, Shenglong and Gao, Zhangwei and Wang, Yaohui and Zhang, Yanting and Li, Yuanqi and others}, |
| | journal={arXiv preprint arXiv:2510.11341}, |
| | year={2025} |
| | } |
| | ``` |