Instructions to use tiny-random/qwen3.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiny-random/qwen3.6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="tiny-random/qwen3.6") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("tiny-random/qwen3.6") model = AutoModelForImageTextToText.from_pretrained("tiny-random/qwen3.6") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiny-random/qwen3.6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiny-random/qwen3.6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiny-random/qwen3.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/tiny-random/qwen3.6
- SGLang
How to use tiny-random/qwen3.6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiny-random/qwen3.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiny-random/qwen3.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiny-random/qwen3.6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiny-random/qwen3.6", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use tiny-random/qwen3.6 with Docker Model Runner:
docker model run hf.co/tiny-random/qwen3.6
| library_name: transformers | |
| base_model: | |
| - Qwen/Qwen3.6-27B | |
| This tiny model is intended for debugging. It is randomly initialized using the configuration adapted from [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B). | |
| | File path | Size | | |
| |------|------| | |
| | model.safetensors | 8.8MB | | |
| ### Example usage: | |
| - vLLM | |
| ```bash | |
| # Multi-token prediction is supported | |
| model_id=tiny-random/qwen3.6 | |
| vllm serve $model_id \ | |
| --tensor-parallel-size 2 \ | |
| --speculative-config.method qwen3_next_mtp \ | |
| --speculative-config.num_speculative_tokens 2 \ | |
| --reasoning-parser qwen3 \ | |
| --tool-call-parser qwen3_coder \ | |
| --enable-auto-tool-choice \ | |
| --max-cudagraph-capture-size 16 | |
| ``` | |
| - SGLang | |
| ```bash | |
| # Multi-token prediction is supported | |
| model_id=tiny-random/qwen3.6 | |
| python3 -m sglang.launch_server \ | |
| --model-path $model_id \ | |
| --tp-size 2 \ | |
| --tool-call-parser qwen3_coder \ | |
| --reasoning-parser qwen3 \ | |
| --speculative-algo NEXTN \ | |
| --speculative-num-steps 3 \ | |
| --speculative-eagle-topk 1 \ | |
| --speculative-num-draft-tokens 4 | |
| ``` | |
| - Transformers | |
| ```python | |
| import torch | |
| from transformers import ( | |
| Qwen3_5ForConditionalGeneration, | |
| AutoProcessor, | |
| ) | |
| model_id = "tiny-random/qwen3.6" | |
| model = Qwen3_5ForConditionalGeneration.from_pretrained( | |
| model_id, dtype=torch.bfloat16, device_map="auto", | |
| ) | |
| processor = AutoProcessor.from_pretrained(model_id) | |
| messages = [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "image", | |
| "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", | |
| }, | |
| {"type": "text", "text": "Describe this image."}, | |
| ], | |
| } | |
| ] | |
| inputs = processor.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_dict=True, | |
| return_tensors="pt" | |
| ).to(model.device) | |
| generated_ids = model.generate(**inputs, max_new_tokens=32) | |
| output_text = processor.batch_decode(generated_ids[0], skip_special_tokens=False)[0] | |
| print(output_text.replace('<|image_pad|>', "I")) | |
| ``` | |
| ### Codes to create this repo: | |
| <details> | |
| <summary>Click to expand</summary> | |
| ```python | |
| import json | |
| from copy import deepcopy | |
| from pathlib import Path | |
| import torch | |
| from huggingface_hub import file_exists, hf_hub_download | |
| from transformers import ( | |
| AutoConfig, | |
| AutoModelForCausalLM, | |
| AutoProcessor, | |
| GenerationConfig, | |
| Qwen3_5ForConditionalGeneration, | |
| set_seed, | |
| ) | |
| source_model_id = "Qwen/Qwen3.6-27B" | |
| save_folder = "/tmp/tiny-random/qwen36" | |
| processor = AutoProcessor.from_pretrained(source_model_id, trust_remote_code=True) | |
| processor.save_pretrained(save_folder) | |
| with open(hf_hub_download(source_model_id, filename='config.json', repo_type='model'), 'r', encoding='utf-8') as f: | |
| config_json = json.load(f) | |
| config_json['text_config'].update({ | |
| 'head_dim': 32, | |
| 'hidden_size': 8, | |
| "layer_types": ['linear_attention'] * 3 + ['full_attention'], | |
| 'intermediate_size': 32, | |
| 'num_hidden_layers': 4, | |
| 'num_attention_heads': 8, | |
| 'num_key_value_heads': 4, | |
| "linear_key_head_dim": 32, | |
| "linear_num_key_heads": 4, | |
| "linear_num_value_heads": 8, | |
| "linear_value_head_dim": 32, | |
| }) | |
| config_json['text_config']['rope_parameters']['mrope_section'] = [1, 1, 2] | |
| config_json["tie_word_embeddings"] = False | |
| config_json['vision_config'].update( | |
| { | |
| 'hidden_size': 64, | |
| 'intermediate_size': 128, | |
| 'num_heads': 2, | |
| 'out_hidden_size': 8, | |
| 'depth': 2, | |
| } | |
| ) | |
| with open(f"{save_folder}/config.json", "w", encoding='utf-8') as f: | |
| json.dump(config_json, f, indent=2) | |
| config = AutoConfig.from_pretrained( | |
| save_folder, | |
| trust_remote_code=True, | |
| ) | |
| print(config) | |
| set_seed(42) | |
| torch.set_default_dtype(torch.bfloat16) | |
| model = Qwen3_5ForConditionalGeneration(config) | |
| # in Qwen/Qwen3.6-27B release, all tensors are in bfloat16 | |
| # with torch.no_grad(): | |
| # for i in range(3): | |
| # attn = model.model.language_model.layers[i].linear_attn | |
| # attn.A_log = torch.nn.Parameter(attn.A_log.float()) | |
| # attn.norm.float() | |
| print(model.state_dict()['model.language_model.layers.0.linear_attn.A_log'].dtype) | |
| print(model.state_dict()['model.language_model.layers.0.linear_attn.norm.weight'].dtype) | |
| model.mtp = torch.nn.ModuleDict({ | |
| "pre_fc_norm_embedding": torch.nn.RMSNorm(config.text_config.hidden_size), | |
| "fc": torch.nn.Linear(config.text_config.hidden_size * 2, config.text_config.hidden_size, bias=False), | |
| "layers": torch.nn.ModuleList([deepcopy(model.model.language_model.layers[3])]), | |
| "norm": torch.nn.RMSNorm(config.text_config.hidden_size), | |
| "pre_fc_norm_hidden": torch.nn.RMSNorm(config.text_config.hidden_size), | |
| }) | |
| torch.set_default_dtype(torch.float32) | |
| if file_exists(filename="generation_config.json", repo_id=source_model_id, repo_type='model'): | |
| model.generation_config = GenerationConfig.from_pretrained( | |
| source_model_id, trust_remote_code=True, | |
| ) | |
| model.generation_config.do_sample = True | |
| print(model.generation_config) | |
| model = model.cpu() | |
| set_seed(42) | |
| with torch.no_grad(): | |
| for name, p in sorted(model.named_parameters()): | |
| torch.nn.init.normal_(p, 0, 0.2) | |
| print(name, p.shape) | |
| model.save_pretrained(save_folder) | |
| ``` | |
| </details> | |
| ### Printing the model: | |
| <details><summary>Click to expand</summary> | |
| ```text | |
| Qwen3_5ForConditionalGeneration( | |
| (model): Qwen3_5Model( | |
| (visual): Qwen3_5VisionModel( | |
| (patch_embed): Qwen3_5VisionPatchEmbed( | |
| (proj): Conv3d(3, 64, kernel_size=(2, 16, 16), stride=(2, 16, 16)) | |
| ) | |
| (pos_embed): Embedding(2304, 64) | |
| (rotary_pos_emb): Qwen3_5VisionRotaryEmbedding() | |
| (blocks): ModuleList( | |
| (0-1): 2 x Qwen3_5VisionBlock( | |
| (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True) | |
| (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True) | |
| (attn): Qwen3_5VisionAttention( | |
| (qkv): Linear(in_features=64, out_features=192, bias=True) | |
| (proj): Linear(in_features=64, out_features=64, bias=True) | |
| ) | |
| (mlp): Qwen3_5VisionMLP( | |
| (linear_fc1): Linear(in_features=64, out_features=128, bias=True) | |
| (linear_fc2): Linear(in_features=128, out_features=64, bias=True) | |
| (act_fn): GELUTanh() | |
| ) | |
| ) | |
| ) | |
| (merger): Qwen3_5VisionPatchMerger( | |
| (norm): LayerNorm((64,), eps=1e-06, elementwise_affine=True) | |
| (linear_fc1): Linear(in_features=256, out_features=256, bias=True) | |
| (act_fn): GELU(approximate='none') | |
| (linear_fc2): Linear(in_features=256, out_features=8, bias=True) | |
| ) | |
| ) | |
| (language_model): Qwen3_5TextModel( | |
| (embed_tokens): Embedding(248320, 8) | |
| (layers): ModuleList( | |
| (0-2): 3 x Qwen3_5DecoderLayer( | |
| (linear_attn): Qwen3_5GatedDeltaNet( | |
| (act): SiLUActivation() | |
| (conv1d): Conv1d(512, 512, kernel_size=(4,), stride=(1,), padding=(3,), groups=512, bias=False) | |
| (norm): Qwen3_5RMSNormGated() | |
| (out_proj): Linear(in_features=256, out_features=8, bias=False) | |
| (in_proj_qkv): Linear(in_features=8, out_features=512, bias=False) | |
| (in_proj_z): Linear(in_features=8, out_features=256, bias=False) | |
| (in_proj_b): Linear(in_features=8, out_features=8, bias=False) | |
| (in_proj_a): Linear(in_features=8, out_features=8, bias=False) | |
| ) | |
| (mlp): Qwen3_5MLP( | |
| (gate_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (up_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (down_proj): Linear(in_features=32, out_features=8, bias=False) | |
| (act_fn): SiLUActivation() | |
| ) | |
| (input_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| (post_attention_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| ) | |
| (3): Qwen3_5DecoderLayer( | |
| (self_attn): Qwen3_5Attention( | |
| (q_proj): Linear(in_features=8, out_features=512, bias=False) | |
| (k_proj): Linear(in_features=8, out_features=128, bias=False) | |
| (v_proj): Linear(in_features=8, out_features=128, bias=False) | |
| (o_proj): Linear(in_features=256, out_features=8, bias=False) | |
| (q_norm): Qwen3_5RMSNorm((32,), eps=1e-06) | |
| (k_norm): Qwen3_5RMSNorm((32,), eps=1e-06) | |
| ) | |
| (mlp): Qwen3_5MLP( | |
| (gate_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (up_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (down_proj): Linear(in_features=32, out_features=8, bias=False) | |
| (act_fn): SiLUActivation() | |
| ) | |
| (input_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| (post_attention_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| ) | |
| ) | |
| (norm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| (rotary_emb): Qwen3_5TextRotaryEmbedding() | |
| ) | |
| ) | |
| (lm_head): Linear(in_features=8, out_features=248320, bias=False) | |
| (mtp): ModuleDict( | |
| (pre_fc_norm_embedding): RMSNorm((8,), eps=None, elementwise_affine=True) | |
| (fc): Linear(in_features=16, out_features=8, bias=False) | |
| (layers): ModuleList( | |
| (0): Qwen3_5DecoderLayer( | |
| (self_attn): Qwen3_5Attention( | |
| (q_proj): Linear(in_features=8, out_features=512, bias=False) | |
| (k_proj): Linear(in_features=8, out_features=128, bias=False) | |
| (v_proj): Linear(in_features=8, out_features=128, bias=False) | |
| (o_proj): Linear(in_features=256, out_features=8, bias=False) | |
| (q_norm): Qwen3_5RMSNorm((32,), eps=1e-06) | |
| (k_norm): Qwen3_5RMSNorm((32,), eps=1e-06) | |
| ) | |
| (mlp): Qwen3_5MLP( | |
| (gate_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (up_proj): Linear(in_features=8, out_features=32, bias=False) | |
| (down_proj): Linear(in_features=32, out_features=8, bias=False) | |
| (act_fn): SiLUActivation() | |
| ) | |
| (input_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| (post_attention_layernorm): Qwen3_5RMSNorm((8,), eps=1e-06) | |
| ) | |
| ) | |
| (norm): RMSNorm((8,), eps=None, elementwise_affine=True) | |
| (pre_fc_norm_hidden): RMSNorm((8,), eps=None, elementwise_affine=True) | |
| ) | |
| ) | |
| ``` | |
| </details> | |
| ### Test environment: | |
| - torch: 2.11.0 | |
| - transformers: 5.5.0 |