Instructions to use MSALab/PerceptionDLM-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MSALab/PerceptionDLM-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MSALab/PerceptionDLM-Base", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MSALab/PerceptionDLM-Base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MSALab/PerceptionDLM-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MSALab/PerceptionDLM-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MSALab/PerceptionDLM-Base",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MSALab/PerceptionDLM-Base

SGLang

How to use MSALab/PerceptionDLM-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MSALab/PerceptionDLM-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MSALab/PerceptionDLM-Base",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MSALab/PerceptionDLM-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MSALab/PerceptionDLM-Base",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MSALab/PerceptionDLM-Base with Docker Model Runner:
```
docker model run hf.co/MSALab/PerceptionDLM-Base
```

MSALab commited on 5 days ago

Commit

db8eff4

verified ·

1 Parent(s): 139a56f

Add files using upload-large-folder tool

Browse files

Files changed (22) hide show

README.md +99 -0
cache.py +94 -0
chat_template.json +3 -0
chat_template_utils.py +533 -0
config.json +327 -0
configuration_dmllm.py +77 -0
configuration_llada.py +175 -0
model-00001-of-00005.safetensors +3 -0
model-00002-of-00005.safetensors +3 -0
model-00003-of-00005.safetensors +3 -0
model-00004-of-00005.safetensors +3 -0
model-00005-of-00005.safetensors +3 -0
model.safetensors.index.json +0 -0
modeling_abstractor.py +30 -0
modeling_dmllm.py +512 -0
modeling_llada.py +0 -0
preprocessor_config.json +27 -0
processing_dmllm.py +401 -0
processor_config.json +15 -0
special_tokens_map.json +60 -0
tokenizer.json +0 -0
tokenizer_config.json +2215 -0

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: image-text-to-text
+base_model:
+- MSALab/LLaDA-8B-Instruct-HF
+tags:
+- multimodal
+- diffusion-language-model
+- dllm
+- vision-language-model
+- perception
+---
+# PerceptionDLM-Base
+**PerceptionDLM-Base** is a strong open **multimodal diffusion language model (DLM)** that extends a large language diffusion backbone (LLaDA-8B) to visual instruction tuning. It establishes a new state-of-the-art baseline among open discrete-diffusion VLMs, outperforming LLaDA-V on **15 / 16** standard multimodal benchmarks while remaining competitive with same-scale autoregressive (AR) VLMs.
+It serves as the foundation model for [**PerceptionDLM**](https://huggingface.co/MSALab/PerceptionDLM), our parallel region-perception model.
+<p align="center">
+  📄 <a href="https://arxiv.org/abs/2606.19534">Paper</a> &nbsp;|&nbsp;
+  💻 <a href="https://github.com/MSALab-PKU/PerceptionDLM">Code</a> &nbsp;|&nbsp;
+  🤗 <a href="https://huggingface.co/collections/MSALab/perceptiondlm-model-zoo">Model Collection</a>
+</p>
+## Highlights
+- 🧠 **Diffusion-based VLM.** Non-autoregressive masked-denoising generation with intrinsic token-level parallelism.
+- 🏗️ **LLaVA-style architecture.** SigLIP-2 vision encoder + 2-layer MLP connector + LLaDA-8B diffusion decoder, with dynamic-resolution tiling for high-resolution inputs.
+- 🏆 **Strong baseline.** Outperforms LLaDA-V on 15/16 benchmarks; especially strong on fine-grained perception and hallucination robustness.
+## Model Details
+| | |
+| :--- | :--- |
+| Vision encoder | `google/siglip2-so400m-patch16-512` (frozen) |
+| Connector | 2-layer MLP with GELU |
+| Language backbone | LLaDA-Instruct-8B (diffusion) |
+| Parameters | ~8B |
+| Training | 4-stage visual instruction tuning, 32× H100 (~3 weeks) |
+| Precision | bfloat16 |
+## Results
+PerceptionDLM-Base vs. open diffusion / AR VLMs (selected benchmarks):
+| Benchmark | PerceptionDLM-Base | LLaDA-V | Qwen2.5-VL-7B | InternVL3-8B |
+| :--- | :---: | :---: | :---: | :---: |
+| MMBench | **85.0** | 82.9 | 83.5 | 83.4 |
+| SeedBench | **78.9** | 74.8 | 77.0 | 77.1 |
+| ChartQA | **91.6** | 78.3 | 86.2 | 86.6 |
+| MMVP | **82.0** | 76.7 | 73.3 | 80.0 |
+| BLINK | **60.3** | 50.9 | 55.3 | 55.5 |
+| RealWorldQA | **73.7** | 63.2 | 68.4 | 70.8 |
+| HallusionBench | **58.4** | 50.9 | 51.9 | 49.9 |
+See the [paper](https://arxiv.org/abs/2606.19534) for the full 16-benchmark comparison.
+## Usage
+Full inference scripts are provided in the [GitHub repository](https://github.com/MSALab-PKU/PerceptionDLM).
+```bash
+python demo/infer_dmllm.py \
+  --model-path MSALab/PerceptionDLM-Base \
+  --image assets/demo.jpg \
+  --prompt "What color shirt is the man in the picture wearing?" \
+  --gen-length 64 --block-length 64 --steps 64
+```
+```python
+import torch
+from transformers import AutoModel, AutoProcessor
+model_path = "MSALab/PerceptionDLM-Base"
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+model = AutoModel.from_pretrained(
+    model_path, torch_dtype=torch.bfloat16, trust_remote_code=True
+).cuda().eval()
+# See demo/infer_dmllm.py for the full preprocessing + generation pipeline.
+```
+## Citation
+```bibtex
+@article{sun2026perceptiondlm,
+  title   = {PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models},
+  author  = {Sun, Yueyi and Wang, Yuhao and Li, Jason and Tian, Ye and Zhang, Tao and Mai, Jacky and Wang, Yihan and Wang, Haochen and Bai, Jinbin and Yang, Ling and Tong, Yunhai},
+  journal = {arXiv preprint arXiv:2606.19534},
+  year    = {2026}
+}
+```
+## License
+Released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

cache.py ADDED Viewed

	@@ -0,0 +1,94 @@

+from dataclasses import dataclass
+@dataclass
+class dLLMCacheConfig:
+    prompt_interval_steps: int = 1
+    gen_interval_steps: int = 1
+    transfer_ratio: float = 0.0
+    cfg_interval_steps: int = 1
+import torch
+from collections import defaultdict
+class Singleton(type):
+    _instances = {}
+    def __call__(cls, *args, **kwargs):
+        if cls not in cls._instances:
+            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
+        return cls._instances[cls]
+class dLLMCache(metaclass=Singleton):
+    gen_interval_steps: int
+    prompt_interval_steps: int
+    cfg_interval_steps: int
+    prompt_length: int
+    transfer_ratio: float
+    __cache: defaultdict
+    __step_counter: defaultdict
+    @classmethod
+    def new_instance(
+        cls,
+        prompt_interval_steps: int = 1,
+        gen_interval_steps: int = 1,
+        cfg_interval_steps: int = 1,
+        transfer_ratio: float = 0.0,
+    ) -> "dLLMCache":
+        ins = cls()
+        setattr(ins, "prompt_interval_steps", prompt_interval_steps)
+        setattr(ins, "gen_interval_steps", gen_interval_steps)
+        setattr(ins, "cfg_interval_steps", cfg_interval_steps)
+        setattr(ins, "transfer_ratio", transfer_ratio)
+        ins.init()
+        return ins
+    def init(self) -> None:
+        self.__cache = defaultdict(
+            lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
+        )
+        self.__step_counter = defaultdict(lambda: defaultdict(lambda: 0))
+    def reset_cache(self, prompt_length: int = 0) -> None:
+        self.init()
+        torch.cuda.empty_cache()
+        self.prompt_length = prompt_length
+        self.cache_type = "no_cfg"
+    def set_cache(
+        self, layer_id: int, feature_name: str, features: torch.Tensor, cache_type: str
+    ) -> None:
+        self.__cache[self.cache_type][cache_type][layer_id][feature_name] = {
+            0: features
+        }
+    def get_cache(
+        self, layer_id: int, feature_name: str, cache_type: str
+    ) -> torch.Tensor:
+        output = self.__cache[self.cache_type][cache_type][layer_id][feature_name][0]
+        return output
+    def update_step(self, layer_id: int) -> None:
+        self.__step_counter[self.cache_type][layer_id] += 1
+    def refresh_gen(self, layer_id: int = 0) -> bool:
+        return (self.current_step - 1) % self.gen_interval_steps == 0
+    def refresh_prompt(self, layer_id: int = 0) -> bool:
+        return (self.current_step - 1) % self.prompt_interval_steps == 0
+    def refresh_cfg(self, layer_id: int = 0) -> bool:
+        return (
+            self.current_step - 1
+        ) % self.cfg_interval_steps == 0 or self.current_step <= 5
+    @property
+    def current_step(self) -> int:
+        return max(list(self.__step_counter[self.cache_type].values()), default=1)
+    def __repr__(self):
+        return f"USE dLLMCache"

chat_template.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "chat_template": "{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant.<|eot_id|>\n{% endif %}<|start_header_id|>{{ message['role'] }}<|end_header_id|>\n{% if message['role'] == 'assistant' %}{% generation %}{{ message['content'][0]['text'] }}<|eot_id|>{% endgeneration %}{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}<img><IMG_CONTEXT></img>{% elif content['type'] == 'video' or 'video' in content %}<video><VIDEO_CONTEXT></video>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|eot_id|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|start_header_id|>assistant<|end_header_id|>\n{% endif %}"
+}

chat_template_utils.py ADDED Viewed

	@@ -0,0 +1,533 @@

+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import inspect
+import json
+import re
+import types
+from contextlib import contextmanager
+from datetime import datetime
+from functools import lru_cache
+from inspect import isfunction
+from typing import Any, Callable, Optional, Union, get_args, get_origin, get_type_hints
+from packaging import version
+from transformers.utils import logging
+from transformers.utils.import_utils import is_jinja_available, is_torch_available, is_vision_available
+logger = logging.get_logger(__name__)
+if is_jinja_available():
+    import jinja2
+    from jinja2.ext import Extension
+    from jinja2.sandbox import ImmutableSandboxedEnvironment
+else:
+    jinja2 = None
+if is_vision_available():
+    from PIL.Image import Image
+if is_torch_available():
+    from torch import Tensor
+BASIC_TYPES = (int, float, str, bool, Any, type(None), ...)
+# Extracts the initial segment of the docstring, containing the function description
+description_re = re.compile(r"^(.*?)[\n\s]*(Args:|Returns:|Raises:|\Z)", re.DOTALL)
+# Extracts the Args: block from the docstring
+args_re = re.compile(r"\n\s*Args:\n\s*(.*?)[\n\s]*(Returns:|Raises:|\Z)", re.DOTALL)
+# Splits the Args: block into individual arguments
+args_split_re = re.compile(
+    r"""
+(?:^|\n)  # Match the start of the args block, or a newline
+\s*(\w+):\s*  # Capture the argument name and strip spacing
+(.*?)\s*  # Capture the argument description, which can span multiple lines, and strip trailing spacing
+(?=\n\s*\w+:|\Z)  # Stop when you hit the next argument or the end of the block
+""",
+    re.DOTALL | re.VERBOSE,
+)
+# Extracts the Returns: block from the docstring, if present. Note that most chat templates ignore the return type/doc!
+returns_re = re.compile(r"\n\s*Returns:\n\s*(.*?)[\n\s]*(Raises:|\Z)", re.DOTALL)
+class TypeHintParsingException(Exception):
+    """Exception raised for errors in parsing type hints to generate JSON schemas"""
+    pass
+class DocstringParsingException(Exception):
+    """Exception raised for errors in parsing docstrings to generate JSON schemas"""
+    pass
+def _get_json_schema_type(param_type: str) -> dict[str, str]:
+    type_mapping = {
+        int: {"type": "integer"},
+        float: {"type": "number"},
+        str: {"type": "string"},
+        bool: {"type": "boolean"},
+        type(None): {"type": "null"},
+        Any: {},
+    }
+    if is_vision_available():
+        type_mapping[Image] = {"type": "image"}
+    if is_torch_available():
+        type_mapping[Tensor] = {"type": "audio"}
+    return type_mapping.get(param_type, {"type": "object"})
+def _parse_type_hint(hint: str) -> dict:
+    origin = get_origin(hint)
+    args = get_args(hint)
+    if origin is None:
+        try:
+            return _get_json_schema_type(hint)
+        except KeyError:
+            raise TypeHintParsingException(
+                "Couldn't parse this type hint, likely due to a custom class or object: ", hint
+            )
+    elif origin is Union or (hasattr(types, "UnionType") and origin is types.UnionType):
+        # Recurse into each of the subtypes in the Union, except None, which is handled separately at the end
+        subtypes = [_parse_type_hint(t) for t in args if t is not type(None)]
+        if len(subtypes) == 1:
+            # A single non-null type can be expressed directly
+            return_dict = subtypes[0]
+        elif all(isinstance(subtype["type"], str) for subtype in subtypes):
+            # A union of basic types can be expressed as a list in the schema
+            return_dict = {"type": sorted([subtype["type"] for subtype in subtypes])}
+        else:
+            # A union of more complex types requires "anyOf"
+            return_dict = {"anyOf": subtypes}
+        if type(None) in args:
+            return_dict["nullable"] = True
+        return return_dict
+    elif origin is list:
+        if not args:
+            return {"type": "array"}
+        else:
+            # Lists can only have a single type argument, so recurse into it
+            return {"type": "array", "items": _parse_type_hint(args[0])}
+    elif origin is tuple:
+        if not args:
+            return {"type": "array"}
+        if len(args) == 1:
+            raise TypeHintParsingException(
+                f"The type hint {str(hint).replace('typing.', '')} is a Tuple with a single element, which "
+                "we do not automatically convert to JSON schema as it is rarely necessary. If this input can contain "
+                "more than one element, we recommend "
+                "using a List[] type instead, or if it really is a single element, remove the Tuple[] wrapper and just "
+                "pass the element directly."
+            )
+        if ... in args:
+            raise TypeHintParsingException(
+                "Conversion of '...' is not supported in Tuple type hints. "
+                "Use List[] types for variable-length"
+                " inputs instead."
+            )
+        return {"type": "array", "prefixItems": [_parse_type_hint(t) for t in args]}
+    elif origin is dict:
+        # The JSON equivalent to a dict is 'object', which mandates that all keys are strings
+        # However, we can specify the type of the dict values with "additionalProperties"
+        out = {"type": "object"}
+        if len(args) == 2:
+            out["additionalProperties"] = _parse_type_hint(args[1])
+        return out
+    raise TypeHintParsingException("Couldn't parse this type hint, likely due to a custom class or object: ", hint)
+def _convert_type_hints_to_json_schema(func: Callable) -> dict:
+    type_hints = get_type_hints(func)
+    signature = inspect.signature(func)
+    required = []
+    for param_name, param in signature.parameters.items():
+        if param.annotation == inspect.Parameter.empty:
+            raise TypeHintParsingException(f"Argument {param.name} is missing a type hint in function {func.__name__}")
+        if param.default == inspect.Parameter.empty:
+            required.append(param_name)
+    properties = {}
+    for param_name, param_type in type_hints.items():
+        properties[param_name] = _parse_type_hint(param_type)
+    schema = {"type": "object", "properties": properties}
+    if required:
+        schema["required"] = required
+    return schema
+def parse_google_format_docstring(docstring: str) -> tuple[Optional[str], Optional[dict], Optional[str]]:
+    """
+    Parses a Google-style docstring to extract the function description,
+    argument descriptions, and return description.
+    Args:
+        docstring (str): The docstring to parse.
+    Returns:
+        The function description, arguments, and return description.
+    """
+    # Extract the sections
+    description_match = description_re.search(docstring)
+    args_match = args_re.search(docstring)
+    returns_match = returns_re.search(docstring)
+    # Clean and store the sections
+    description = description_match.group(1).strip() if description_match else None
+    docstring_args = args_match.group(1).strip() if args_match else None
+    returns = returns_match.group(1).strip() if returns_match else None
+    # Parsing the arguments into a dictionary
+    if docstring_args is not None:
+        docstring_args = "\n".join([line for line in docstring_args.split("\n") if line.strip()])  # Remove blank lines
+        matches = args_split_re.findall(docstring_args)
+        args_dict = {match[0]: re.sub(r"\s*\n+\s*", " ", match[1].strip()) for match in matches}
+    else:
+        args_dict = {}
+    return description, args_dict, returns
+def get_json_schema(func: Callable) -> dict:
+    """
+    This function generates a JSON schema for a given function, based on its docstring and type hints. This is
+    mostly used for passing lists of tools to a chat template. The JSON schema contains the name and description of
+    the function, as well as the names, types and descriptions for each of its arguments. `get_json_schema()` requires
+    that the function has a docstring, and that each argument has a description in the docstring, in the standard
+    Google docstring format shown below. It also requires that all the function arguments have a valid Python type hint.
+    Although it is not required, a `Returns` block can also be added, which will be included in the schema. This is
+    optional because most chat templates ignore the return value of the function.
+    Args:
+        func: The function to generate a JSON schema for.
+    Returns:
+        A dictionary containing the JSON schema for the function.
+    Examples:
+    ```python
+    >>> def multiply(x: float, y: float):
+    >>>    '''
+    >>>    A function that multiplies two numbers
+    >>>
+    >>>    Args:
+    >>>        x: The first number to multiply
+    >>>        y: The second number to multiply
+    >>>    '''
+    >>>    return x * y
+    >>>
+    >>> print(get_json_schema(multiply))
+    {
+        "name": "multiply",
+        "description": "A function that multiplies two numbers",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "x": {"type": "number", "description": "The first number to multiply"},
+                "y": {"type": "number", "description": "The second number to multiply"}
+            },
+            "required": ["x", "y"]
+        }
+    }
+    ```
+    The general use for these schemas is that they are used to generate tool descriptions for chat templates that
+    support them, like so:
+    ```python
+    >>> from transformers import AutoTokenizer
+    >>> from transformers.utils import get_json_schema
+    >>>
+    >>> def multiply(x: float, y: float):
+    >>>    '''
+    >>>    A function that multiplies two numbers
+    >>>
+    >>>    Args:
+    >>>        x: The first number to multiply
+    >>>        y: The second number to multiply
+    >>>    return x * y
+    >>>    '''
+    >>>
+    >>> multiply_schema = get_json_schema(multiply)
+    >>> tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
+    >>> messages = [{"role": "user", "content": "What is 179 x 4571?"}]
+    >>> formatted_chat = tokenizer.apply_chat_template(
+    >>>     messages,
+    >>>     tools=[multiply_schema],
+    >>>     chat_template="tool_use",
+    >>>     return_dict=True,
+    >>>     return_tensors="pt",
+    >>>     add_generation_prompt=True
+    >>> )
+    >>> # The formatted chat can now be passed to model.generate()
+    ```
+    Each argument description can also have an optional `(choices: ...)` block at the end, such as
+    `(choices: ["tea", "coffee"])`, which will be parsed into an `enum` field in the schema. Note that this will
+    only be parsed correctly if it is at the end of the line:
+    ```python
+    >>> def drink_beverage(beverage: str):
+    >>>    '''
+    >>>    A function that drinks a beverage
+    >>>
+    >>>    Args:
+    >>>        beverage: The beverage to drink (choices: ["tea", "coffee"])
+    >>>    '''
+    >>>    pass
+    >>>
+    >>> print(get_json_schema(drink_beverage))
+    ```
+    {
+        'name': 'drink_beverage',
+        'description': 'A function that drinks a beverage',
+        'parameters': {
+            'type': 'object',
+            'properties': {
+                'beverage': {
+                    'type': 'string',
+                    'enum': ['tea', 'coffee'],
+                    'description': 'The beverage to drink'
+                    }
+                },
+            'required': ['beverage']
+        }
+    }
+    """
+    doc = inspect.getdoc(func)
+    if not doc:
+        raise DocstringParsingException(
+            f"Cannot generate JSON schema for {func.__name__} because it has no docstring!"
+        )
+    doc = doc.strip()
+    main_doc, param_descriptions, return_doc = parse_google_format_docstring(doc)
+    json_schema = _convert_type_hints_to_json_schema(func)
+    if (return_dict := json_schema["properties"].pop("return", None)) is not None:
+        if return_doc is not None:  # We allow a missing return docstring since most templates ignore it
+            return_dict["description"] = return_doc
+    for arg, schema in json_schema["properties"].items():
+        if arg not in param_descriptions:
+            raise DocstringParsingException(
+                f"Cannot generate JSON schema for {func.__name__} because the docstring has no description for the argument '{arg}'"
+            )
+        desc = param_descriptions[arg]
+        enum_choices = re.search(r"\(choices:\s*(.*?)\)\s*$", desc, flags=re.IGNORECASE)
+        if enum_choices:
+            schema["enum"] = [c.strip() for c in json.loads(enum_choices.group(1))]
+            desc = enum_choices.string[: enum_choices.start()].strip()
+        schema["description"] = desc
+    output = {"name": func.__name__, "description": main_doc, "parameters": json_schema}
+    if return_dict is not None:
+        output["return"] = return_dict
+    return {"type": "function", "function": output}
+def _render_with_assistant_indices(
+    compiled_template, messages, tools, documents, add_generation_prompt, **template_kwargs
+):
+    rendered_blocks = []
+    generation_indices = []
+    with compiled_template.environment.activate_tracker(rendered_blocks, generation_indices):
+        for block in compiled_template.generate(
+            messages=messages,
+            tools=tools,
+            documents=documents,
+            add_generation_prompt=add_generation_prompt,
+            **template_kwargs,
+        ):
+            rendered_blocks.append(block)
+        rendered_chat = "".join(rendered_blocks)
+    return rendered_chat, generation_indices
+@lru_cache
+def _compile_jinja_template(chat_template):
+    if not is_jinja_available():
+        raise ImportError(
+            "apply_chat_template requires jinja2 to be installed. Please install it using `pip install jinja2`."
+        )
+    class AssistantTracker(Extension):
+        # This extension is used to track the indices of assistant-generated tokens in the rendered chat
+        tags = {"generation"}
+        def __init__(self, environment: ImmutableSandboxedEnvironment):
+            # The class is only initiated by jinja.
+            super().__init__(environment)
+            environment.extend(activate_tracker=self.activate_tracker)
+            self._rendered_blocks = None
+            self._generation_indices = None
+        def parse(self, parser: jinja2.parser.Parser) -> jinja2.nodes.CallBlock:
+            lineno = next(parser.stream).lineno
+            body = parser.parse_statements(["name:endgeneration"], drop_needle=True)
+            return jinja2.nodes.CallBlock(self.call_method("_generation_support"), [], [], body).set_lineno(lineno)
+        @jinja2.pass_eval_context
+        def _generation_support(self, context: jinja2.nodes.EvalContext, caller: jinja2.runtime.Macro) -> str:
+            rv = caller()
+            if self.is_active():
+                # Only track generation indices if the tracker is active
+                start_index = len("".join(self._rendered_blocks))
+                end_index = start_index + len(rv)
+                self._generation_indices.append((start_index, end_index))
+            return rv
+        def is_active(self) -> bool:
+            return self._rendered_blocks or self._generation_indices
+        @contextmanager
+        def activate_tracker(self, rendered_blocks: list[int], generation_indices: list[int]):
+            try:
+                if self.is_active():
+                    raise ValueError("AssistantTracker should not be reused before closed")
+                self._rendered_blocks = rendered_blocks
+                self._generation_indices = generation_indices
+                yield
+            finally:
+                self._rendered_blocks = None
+                self._generation_indices = None
+    if version.parse(jinja2.__version__) < version.parse("3.1.0"):
+        raise ImportError(
+            f"apply_chat_template requires jinja2>=3.1.0 to be installed. Your version is {jinja2.__version__}."
+        )
+    def raise_exception(message):
+        raise jinja2.exceptions.TemplateError(message)
+    def tojson(x, ensure_ascii=False, indent=None, separators=None, sort_keys=False):
+        # We override the built-in tojson filter because Jinja's default filter escapes HTML characters
+        # We also expose some options like custom indents and separators
+        return json.dumps(x, ensure_ascii=ensure_ascii, indent=indent, separators=separators, sort_keys=sort_keys)
+    def strftime_now(format):
+        return datetime.now().strftime(format)
+    jinja_env = ImmutableSandboxedEnvironment(
+        trim_blocks=True, lstrip_blocks=True, extensions=[AssistantTracker, jinja2.ext.loopcontrols]
+    )
+    jinja_env.filters["tojson"] = tojson
+    jinja_env.globals["raise_exception"] = raise_exception
+    jinja_env.globals["strftime_now"] = strftime_now
+    return jinja_env.from_string(chat_template)
+def render_jinja_template(
+    conversations: list[list[dict[str, str]]],
+    tools: Optional[list[Union[dict, Callable]]] = None,
+    documents: Optional[list[dict[str, str]]] = None,
+    chat_template: Optional[str] = None,
+    return_assistant_tokens_mask: Optional[bool] = False,
+    continue_final_message: Optional[bool] = False,
+    add_generation_prompt: Optional[bool] = False,
+    **kwargs,
+) -> str:
+    if return_assistant_tokens_mask and not re.search(r"\{\%-?\s*generation\s*-?\%\}", chat_template):
+        logger.warning_once(
+            "return_assistant_tokens_mask==True but chat template does not contain `{% generation %}` keyword."
+        )
+    # Compilation function uses a cache to avoid recompiling the same template
+    compiled_template = _compile_jinja_template(chat_template)
+    # We accept either JSON schemas or functions for tools. If we get functions, we convert them to schemas
+    if tools is not None:
+        tool_schemas = []
+        for tool in tools:
+            if isinstance(tool, dict):
+                tool_schemas.append(tool)
+            elif isfunction(tool):
+                tool_schemas.append(get_json_schema(tool))
+            else:
+                raise ValueError(
+                    "Tools should either be a JSON schema, or a callable function with type hints "
+                    "and a docstring suitable for auto-conversion to a schema."
+                )
+    else:
+        tool_schemas = None
+    if documents is not None:
+        for document in documents:
+            if not isinstance(document, dict):
+                raise TypeError("Documents should be a list of dicts with 'title' and 'text' keys!")
+    rendered = []
+    all_generation_indices = []
+    for chat in conversations:
+        if hasattr(chat, "messages"):
+            # Indicates it's a Conversation object
+            chat = chat.messages
+        if return_assistant_tokens_mask:
+            rendered_chat, generation_indices = _render_with_assistant_indices(
+                compiled_template=compiled_template,
+                messages=chat,
+                tools=tool_schemas,
+                documents=documents,
+                add_generation_prompt=add_generation_prompt,
+                **kwargs,
+            )
+            all_generation_indices.append(generation_indices)
+        else:
+            rendered_chat = compiled_template.render(
+                messages=chat,
+                tools=tool_schemas,
+                documents=documents,
+                add_generation_prompt=add_generation_prompt,
+                **kwargs,
+            )
+        if continue_final_message:
+            final_message = chat[-1]["content"]
+            if isinstance(final_message, (list, tuple)):
+                for content_block in reversed(final_message):
+                    if "text" in content_block:
+                        # Pick the last text block in the message (the first one we hit while iterating in reverse)
+                        final_message = content_block["text"]
+                        break
+                else:
+                    raise ValueError(
+                        "continue_final_message is set but we could not find any text to continuein the final message!"
+                    )
+            if final_message.strip() not in rendered_chat:
+                raise ValueError(
+                    "continue_final_message is set but the final message does not appear in the chat after "
+                    "applying the chat template! This can happen if the chat template deletes portions of "
+                    "the final message. Please verify the chat template and final message in your chat to "
+                    "ensure they are compatible."
+                )
+            final_msg_loc = rendered_chat.rindex(final_message.strip())
+            if rendered_chat[final_msg_loc : final_msg_loc + len(final_message.lstrip())] == final_message:
+                # The template preserves spacing or the message doesn't have trailing spacing, so things are simple
+                rendered_chat = rendered_chat[: final_msg_loc + len(final_message.lstrip())]
+            else:
+                # The message has trailing spacing that was trimmed, so we must be more cautious
+                rendered_chat = rendered_chat[: final_msg_loc + len(final_message.strip())]
+        rendered.append(rendered_chat)
+    return rendered, all_generation_indices

config.json ADDED Viewed

	@@ -0,0 +1,327 @@

+{
+  "architectures": [
+    "DMLLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_dmllm.DMLLMConfig",
+    "AutoModel": "modeling_dmllm.DMLLM",
+    "AutoModelForCausalLM": "modeling_dmllm.DMLLM"
+  },
+  "downsample_ratio": 0.5,
+  "image_size": 512,
+  "image_token_id": 126349,
+  "language_model_config": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "GSAI-ML/LLaDA-V",
+    "add_cross_attention": false,
+    "architectures": [
+      "LLaDAModelLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "auto_map": {
+      "AutoConfig": "configuration_llada.LLaDAConfig",
+      "AutoModel": "modeling_llada.LLaDAModelLM",
+      "AutoModelForCausalLM": "modeling_llada.LLaDAModelLM"
+    },
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 128000,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 126081,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "silu",
+    "hidden_size": 4096,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_range": 0.02,
+    "intermediate_size": 12288,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 16384,
+    "min_length": 0,
+    "model_type": "llada",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 32,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 32,
+    "num_key_value_heads": 32,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "prefix": null,
+    "pretraining_tp": 1,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "rms_norm_eps": 1e-05,
+    "rope_scaling": null,
+    "rope_theta": 500000.0,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": false,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "bfloat16",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": false,
+    "vocab_size": 126464
+  },
+  "model_type": "dmllm",
+  "num_image_token": 256,
+  "patch_size": 16,
+  "prompt_numbers": 15,
+  "replacement_noise_mode": false,
+  "roi_output_size": null,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "vision_abstractor_config": {
+    "projection_type": "mlp2x_gelu"
+  },
+  "vision_model_config": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "google/siglip2-so400m-patch16-512",
+    "add_cross_attention": false,
+    "architectures": null,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_size": 1152,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_factor": 1.0,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "siglip",
+    "no_repeat_ngram_size": 0,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "text_config": {
+      "_attn_implementation_autoset": false,
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_dropout": 0.0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": 49406,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": 49407,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "hidden_act": "gelu_pytorch_tanh",
+      "hidden_size": 1152,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "intermediate_size": 4304,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "layer_norm_eps": 1e-06,
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "max_position_embeddings": 64,
+      "min_length": 0,
+      "model_type": "siglip_text_model",
+      "no_repeat_ngram_size": 0,
+      "num_attention_heads": 16,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_hidden_layers": 27,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": 1,
+      "prefix": null,
+      "problem_type": null,
+      "projection_size": 1152,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "sep_token_id": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": true,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torch_dtype": "bfloat16",
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false,
+      "vocab_size": 256000
+    },
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "bfloat16",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "vision_config": {
+      "_attn_implementation_autoset": false,
+      "_name_or_path": "",
+      "add_cross_attention": false,
+      "architectures": null,
+      "attention_dropout": 0.0,
+      "bad_words_ids": null,
+      "begin_suppress_tokens": null,
+      "bos_token_id": null,
+      "chunk_size_feed_forward": 0,
+      "cross_attention_hidden_size": null,
+      "decoder_start_token_id": null,
+      "diversity_penalty": 0.0,
+      "do_sample": false,
+      "early_stopping": false,
+      "encoder_no_repeat_ngram_size": 0,
+      "eos_token_id": null,
+      "exponential_decay_length_penalty": null,
+      "finetuning_task": null,
+      "forced_bos_token_id": null,
+      "forced_eos_token_id": null,
+      "hidden_act": "gelu_pytorch_tanh",
+      "hidden_size": 1152,
+      "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+      },
+      "image_size": 512,
+      "intermediate_size": 4304,
+      "is_decoder": false,
+      "is_encoder_decoder": false,
+      "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+      },
+      "layer_norm_eps": 1e-06,
+      "length_penalty": 1.0,
+      "max_length": 20,
+      "min_length": 0,
+      "model_type": "siglip_vision_model",
+      "no_repeat_ngram_size": 0,
+      "num_attention_heads": 16,
+      "num_beam_groups": 1,
+      "num_beams": 1,
+      "num_channels": 3,
+      "num_hidden_layers": 27,
+      "num_return_sequences": 1,
+      "output_attentions": false,
+      "output_hidden_states": false,
+      "output_scores": false,
+      "pad_token_id": null,
+      "patch_size": 16,
+      "prefix": null,
+      "problem_type": null,
+      "pruned_heads": {},
+      "remove_invalid_values": false,
+      "repetition_penalty": 1.0,
+      "return_dict": true,
+      "return_dict_in_generate": false,
+      "sep_token_id": null,
+      "suppress_tokens": null,
+      "task_specific_params": null,
+      "temperature": 1.0,
+      "tf_legacy_loss": false,
+      "tie_encoder_decoder": false,
+      "tie_word_embeddings": true,
+      "tokenizer_class": null,
+      "top_k": 50,
+      "top_p": 1.0,
+      "torch_dtype": "bfloat16",
+      "torchscript": false,
+      "typical_p": 1.0,
+      "use_bfloat16": false
+    }
+  },
+  "vision_output_key": null,
+  "vision_select_layer": -2
+}

configuration_dmllm.py ADDED Viewed

	@@ -0,0 +1,77 @@

+from transformers import PretrainedConfig, AutoConfig, CONFIG_MAPPING
+from transformers.dynamic_module_utils import get_class_from_dynamic_module
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+class DMLLMConfig(PretrainedConfig):
+    model_type = "dmllm"
+    is_composition = True
+    def __init__(self,
+                 language_model_config=None,
+                 vision_model_config=None,
+                 vision_abstractor_config=None,
+                 image_token_id=None,
+                 image_size=512,
+                 patch_size=16,
+                 downsample_ratio=0.5,
+                 vision_select_layer=-2,
+                 replacement_noise_mode=False,
+                 **kwargs):
+        super().__init__(**kwargs)
+        self.replacement_noise_mode = replacement_noise_mode
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.downsample_ratio = downsample_ratio
+        self.num_image_token = int((image_size // patch_size) ** 2 * (downsample_ratio ** 2))
+        self.vision_select_layer = vision_select_layer
+        if isinstance(language_model_config, dict):
+            if '_name_or_path' not in language_model_config:
+                language_model_config['_name_or_path'] = self._name_or_path
+            language_model_type = language_model_config.get('model_type', '')
+            is_remote_code = '.' in language_model_config.get('auto_map', {}).get('AutoConfig', '')
+            if language_model_type in CONFIG_MAPPING and not is_remote_code:
+                language_model_config = AutoConfig.for_model(**language_model_config)
+            elif language_model_type:
+                Config = get_class_from_dynamic_module(language_model_config["auto_map"]["AutoConfig"],
+                                                       language_model_config['_name_or_path'])
+                language_model_config = Config(**language_model_config)
+        self.language_model_config = language_model_config
+        if isinstance(vision_model_config, dict):
+            if '_name_or_path' not in vision_model_config:
+                vision_model_config['_name_or_path'] = self._name_or_path
+            vision_model_type = vision_model_config.get('model_type', '')
+            is_remote_code = '.' in vision_model_config.get('auto_map', {}).get('AutoConfig', '')
+            if vision_model_type in CONFIG_MAPPING and not is_remote_code:
+                vision_model_config = AutoConfig.for_model(**vision_model_config)
+            elif vision_model_type:
+                Config = get_class_from_dynamic_module(vision_model_config["auto_map"]["AutoConfig"],
+                                                       vision_model_config['_name_or_path'])
+                vision_model_config = Config(**vision_model_config)
+        self.vision_model_config = vision_model_config
+        self.vision_abstractor_config = vision_abstractor_config
+        self.image_token_id = image_token_id
+    @property
+    def hidden_size(self):
+        return self.language_model_config.hidden_size
+    def to_dict(self):
+        ret_dict = super().to_dict()
+        ret_dict["auto_map"] = {
+            "AutoConfig": "configuration_dmllm.DMLLMConfig",
+            "AutoModel": "modeling_dmllm.DMLLM",
+            "AutoModelForCausalLM": "modeling_dmllm.DMLLM"
+        }
+        return ret_dict
+    @classmethod
+    def from_dict(cls, config_dict, **kwargs):
+        if 'name_or_path' in kwargs:
+            config_dict['_name_or_path'] = kwargs.pop('name_or_path')
+        return super().from_dict(config_dict, **kwargs)

configuration_llada.py ADDED Viewed

	@@ -0,0 +1,175 @@

+# coding=utf-8
+# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
+#
+# This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
+# and OPT implementations in this library. It has been modified from its
+# original forms to accommodate minor architectural differences compared
+# to GPT-NeoX and OPT used by the Meta AI team that trained the model.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" LLaDA model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+LLaDA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
+class LLaDAConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`LLaDAModel`]. It is used to instantiate an LLaDA
+    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
+    defaults will yield a similar configuration to that of the LLaDA-8B.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 32000):
+            Vocabulary size of the LLaDA model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`LLaDAModel`]
+        hidden_size (`int`, *optional*, defaults to 4096):
+            Dimension of the hidden representations.
+        intermediate_size (`int`, *optional*, defaults to 11008):
+            Dimension of the MLP representations.
+        num_hidden_layers (`int`, *optional*, defaults to 32):
+            Number of hidden layers in the Transformer decoder.
+        num_attention_heads (`int`, *optional*, defaults to 32):
+            Number of attention heads for each attention layer in the Transformer decoder.
+        num_key_value_heads (`int`, *optional*):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details checkout [this
+            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
+            `num_attention_heads`.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to 2048):
+            The maximum sequence length that this model might ever be used with.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-06):
+            The epsilon used by the rms normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            Padding token id.
+        bos_token_id (`int`, *optional*, defaults to 1):
+            Beginning of stream token id.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            End of stream token id.
+        pretraining_tp (`int`, *optional*, defaults to 1):
+            Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
+            document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to understand more about it. This value is
+            necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
+            issue](https://github.com/pytorch/pytorch/issues/76232).
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether to tie weight embeddings
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
+            `max_position_embeddings` to the expected new maximum.
+        attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
+            Whether to use a bias in the query, key, value and output projection layers during self-attention.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+    """
+    model_type = "llada"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    def __init__(
+        self,
+        vocab_size=32000,
+        hidden_size=4096,
+        intermediate_size=11008,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=None,
+        hidden_act="silu",
+        max_position_embeddings=2048,
+        initializer_range=0.02,
+        rms_norm_eps=1e-6,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=1,
+        eos_token_id=2,
+        pretraining_tp=1,
+        tie_word_embeddings=False,
+        rope_theta=10000.0,
+        rope_scaling=None,
+        attention_bias=False,
+        attention_dropout=0.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.pretraining_tp = pretraining_tp
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self._rope_scaling_validation()
+        self.attention_bias = attention_bias
+        self.attention_dropout = attention_dropout
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )
+    def _rope_scaling_validation(self):
+        """
+        Validate the `rope_scaling` configuration.
+        """
+        if self.rope_scaling is None:
+            return
+        if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) != 2:
+            raise ValueError(
+                "`rope_scaling` must be a dictionary with with two fields, `type` and `factor`, "
+                f"got {self.rope_scaling}"
+            )
+        rope_scaling_type = self.rope_scaling.get("type", None)
+        rope_scaling_factor = self.rope_scaling.get("factor", None)
+        if rope_scaling_type is None or rope_scaling_type not in ["linear", "dynamic"]:
+            raise ValueError(
+                f"`rope_scaling`'s type field must be one of ['linear', 'dynamic'], got {rope_scaling_type}"
+            )
+        if rope_scaling_factor is None or not isinstance(rope_scaling_factor, float) or rope_scaling_factor <= 1.0:
+            raise ValueError(f"`rope_scaling`'s factor field must be a float > 1, got {rope_scaling_factor}")

model-00001-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2970a0c4777b94c3414a52488c368fb6aabf684b361382bdf664ad68a0501427
+size 3950989604

model-00002-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47a7aa6f94c0ee6a065cbe1ba555746b5f92242a56d31d492d4e4a6813c8eaa1
+size 3926026584

model-00003-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:635a3db4240eb33b5c732434eccdf091fd0015f760abf8d6e58e67df051fca3d
+size 3926026664

model-00004-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b0eda892f8937c02e46d1d0b42e993c0343c7505ba3e8108fe6fb466133b220
+size 3926026664

model-00005-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e339323f63f238721d617f459425eee7a6213dc6326e14faad4395e54488c3e
+size 2646684016

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

modeling_abstractor.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import re
+import torch
+from torch import nn
+from torch.nn import functional as F
+def build_projection(projection_type: str, in_dim: int, out_dim: int) -> nn.Module:
+    mlp_gelu_match = re.match(r'^mlp(\d+)x_gelu$', projection_type)
+    if mlp_gelu_match:
+        mlp_depth = int(mlp_gelu_match.group(1))
+        modules = [nn.Linear(in_dim, out_dim)]
+        for _ in range(1, mlp_depth):
+            modules.append(nn.GELU())
+            modules.append(nn.Linear(out_dim, out_dim))
+        projection = nn.Sequential(*modules)
+        return projection
+    raise ValueError(f'Unknown projector type: {projection_type}')
+class PerceiverProjection(nn.Module):
+    def __init__(self, projection_type: str, in_dim: int, out_dim: int):
+        super().__init__()
+        self.projection = build_projection(projection_type, in_dim, out_dim)
+    def forward(self, input_embeds: torch.Tensor):
+        input_embeds.requires_grad_(True)
+        embeds = self.projection(input_embeds)
+        embeds.requires_grad_(True)
+        return embeds

modeling_dmllm.py ADDED Viewed

	@@ -0,0 +1,512 @@

+from typing import Optional, List
+import torch
+from torch import nn
+from torch.nn import functional as F
+import transformers
+from transformers import PreTrainedModel, AutoModel, AutoModelForCausalLM, GenerationConfig
+from transformers import AutoConfig
+from transformers.feature_extraction_utils import BatchFeature
+from .configuration_dmllm import DMLLMConfig
+from .modeling_abstractor import PerceiverProjection
+from .modeling_llada import LLaDAModelLM
+from .cache import *
+from .configuration_llada import LLaDAConfig
+def build_vision_model(config, model=None):
+    assert hasattr(config, "name_or_path")
+    if model is None:
+        model = AutoModel.from_pretrained(
+            config.name_or_path, config=config, trust_remote_code=True)
+    return model
+def forward_process(bsz,seq_len,device, eps=1e-3):
+    b, l = bsz,seq_len
+    t = torch.rand(b, device=device)
+    # t = torch.sigmoid(t)
+    p_mask = (1 - eps) * t + eps
+    p_mask = p_mask[:, None]#.repeat(1, l)
+    masked_indices = torch.rand((b, l), device=device)
+    mask_cutoff =  torch.max(p_mask,masked_indices.min(-1,keepdim=True).values)
+    masked_indices = masked_indices <= mask_cutoff
+    # mask at least one token
+    # 126336 is used for [MASK] token
+    #noisy_batch = torch.where(masked_indices, 126336, input_ids)
+    return masked_indices, p_mask
+def forward_process_blocks(bsz, seq_len, device, block_length=8, eps=1e-3):
+    """
+    Block-level forward diffusion process for SDAR-v2
+    Args:
+        bsz: batch size
+        seq_len: sequence length
+        device: torch device
+        block_length: length of each block
+        eps: minimum masking probability
+    Returns:
+        masked_indices: boolean tensor indicating which tokens to mask
+        p_mask: masking probabilities
+    """
+    b, l = bsz, seq_len
+    t = torch.rand(b, device=device)
+    p_mask = (1 - eps) * t + eps
+    # Calculate number of blocks
+    num_blocks = (l + block_length - 1) // block_length
+    # Block-level masking probability
+    block_p_mask = p_mask[:, None].expand(b, num_blocks)  # [batch, num_blocks]
+    # Decide which blocks to mask
+    block_mask_decisions = torch.rand(b, num_blocks, device=device) < block_p_mask
+    # Expand block decisions to token level
+    masked_indices = torch.zeros(b, l, device=device, dtype=torch.bool)
+    for i in range(num_blocks):
+        start_idx = i * block_length
+        end_idx = min((i + 1) * block_length, l)
+        # If block is selected for masking, mask all tokens in the block
+        masked_indices[:, start_idx:end_idx] = block_mask_decisions[:, i:i+1]
+    # Add some randomness within blocks (optional)
+    within_block_randomness = 0.2  # 20% chance to flip individual tokens
+    random_flip = torch.rand(b, l, device=device) < within_block_randomness
+    masked_indices = masked_indices ^ (random_flip & masked_indices)
+    return masked_indices, p_mask
+def create_block_attention_mask(seq_len, block_length, device, batch_size=1):
+    """
+    Create block diagonal attention mask for SDAR-v2
+    """
+    num_blocks = (seq_len + block_length - 1) // block_length
+    # Create block-level lower triangular mask
+    block_mask = torch.tril(torch.ones(num_blocks, num_blocks, device=device, dtype=torch.bool))
+    # Expand to token level
+    token_mask = block_mask.repeat_interleave(block_length, dim=0)\
+                          .repeat_interleave(block_length, dim=1)
+    # Crop to actual sequence length
+    token_mask = token_mask[:seq_len, :seq_len]
+    # Convert to 4D format [batch, 1, seq_len, seq_len]
+    attention_mask = token_mask.unsqueeze(0).unsqueeze(0).expand(batch_size, 1, -1, -1)
+    # Convert to additive mask format (0 for attend, -inf for mask)
+    attention_mask = torch.where(
+        attention_mask,
+        torch.zeros_like(attention_mask, dtype=torch.float),
+        torch.full_like(attention_mask, float('-inf'))
+    )
+    return attention_mask
+class DMLLM(PreTrainedModel):
+    config_class = DMLLMConfig
+    supports_gradient_checkpointing = True
+    _skip_keys_device_placement = "past_key_values"
+    _supports_cache_class = False
+    _supports_flash_attn_2 = True
+    _supports_sdpa = True
+    accepts_loss_kwargs=False
+    def __init__(self,
+                 config: DMLLMConfig,
+                 language_model=None,
+                 vision_model=None,
+                 processor=None):
+        super().__init__(config)
+        self.image_size = config.image_size
+        self.patch_size = config.patch_size
+        self.downsample_ratio = config.downsample_ratio
+        self.num_image_token = config.num_image_token
+        self.vision_select_layer = config.vision_select_layer
+        self.replacement_noise_mode = config.replacement_noise_mode
+        try:
+            vision_hidden_states = self.config.vision_model_config.hidden_size
+        except:
+            vision_hidden_states = self.config.vision_model_config.vision_config.hidden_size
+            self.config.vision_model_config.hidden_size = vision_hidden_states
+        vision_model = build_vision_model(config.vision_model_config, vision_model)
+        vision_abstractor = PerceiverProjection(**config.vision_abstractor_config,
+                                                in_dim=self.config.vision_model_config.hidden_size * (int(1 / self.downsample_ratio) ** 2),
+                                                out_dim=self.config.language_model_config.hidden_size)
+        if language_model is None:
+            kwargs_ = {}
+            if config._attn_implementation_internal is not None:
+                kwargs_['attn_implementation'] = config._attn_implementation_internal
+            if 'llada' in config.language_model_config.name_or_path.lower():
+                with transformers.modeling_utils.no_init_weights():
+                    language_model = LLaDAModelLM(config.language_model_config)
+            else:
+                raise ValueError(f"Unsupported language model: {config.language_model_config.name_or_path}")
+        self.vision_model = vision_model
+        self.vision_abstractor = vision_abstractor
+        self.language_model = language_model
+    def forward_vision(self, pixel_values):
+        # pixel_values: (n, c, h, w) or (b, tiles, c, h, w)
+        # Handle BatchFeature input
+        if isinstance(pixel_values, BatchFeature):
+            pixel_values = pixel_values["pixel_values"]
+        # Handle 5D input: (b, tiles, c, h, w) -> (b*tiles, c, h, w)
+        if pixel_values.dim() == 5:
+            b, tiles, c, h, w = pixel_values.shape
+            pixel_values = pixel_values.view(b * tiles, c, h, w)
+        # flags for dummy images (all-zero images)
+        image_flags = torch.sum(pixel_values, dim=(1, 2, 3)) != 0
+        image_flags = image_flags.long()
+        if image_flags.dim() > 1:
+            image_flags = image_flags.squeeze(-1)
+        # extract vision features
+        if self.vision_select_layer == -1:
+            image_embeddings = self.vision_model.vision_model(
+                pixel_values=pixel_values,
+            ).last_hidden_state
+        else:
+            image_embeddings = self.vision_model.vision_model(
+                pixel_values=pixel_values, output_hidden_states=True
+            ).hidden_states[self.vision_select_layer] # (B, N, C)
+        vit_embeds = image_embeddings[image_flags == 1]
+        if self.downsample_ratio != 1:
+            patch_num = self.image_size // self.patch_size
+            vit_embeds = vit_embeds.reshape(vit_embeds.shape[0], patch_num, patch_num, vit_embeds.shape[-1])
+            vit_embeds = self.pixel_shuffle(vit_embeds, scale_factor=self.downsample_ratio)
+            vit_embeds = vit_embeds.flatten(1, 2)
+        vit_embeds = self.vision_abstractor(vit_embeds)
+        return vit_embeds
+    def prepare_for_lm(self, input_ids, vision_embeds):
+        inputs_embeds = self.get_input_embeddings()(input_ids)
+        vision_embeds_ = vision_embeds
+        if vision_embeds is not None:
+            try:
+                vision_mask = input_ids == self.config.image_token_id
+                if torch.count_nonzero(vision_mask).item() != vision_embeds.shape[:-1].numel():
+                    info = "vision embeddings mismatch input embeddings: " \
+                           f"vision_mask shape={vision_mask.shape}; " \
+                           f"vision_mask count={torch.count_nonzero(vision_mask)}; " \
+                           f"vision_embeds shape={vision_embeds.shape}"
+                    #print(info)
+                    num_vision_1 = torch.count_nonzero(vision_mask).item()
+                    num_vision_2 = vision_embeds.shape[:-1].numel()
+                    vision_embeds = vision_embeds.contiguous()
+                    if num_vision_1 <= num_vision_2:
+                        vision_embeds = vision_embeds.view(-1, vision_embeds.size(-1))[:num_vision_1]
+                    else:
+                        vision_embeds = vision_embeds.view(-1, vision_embeds.size(-1))
+                        less_nums = num_vision_1 - num_vision_2
+                        vision_embeds = torch.cat([vision_embeds, vision_embeds[-less_nums:]], dim=0)
+                    vision_embeds = vision_embeds.contiguous()
+                # assert torch.count_nonzero(vision_mask).item() == vision_embeds.shape[:-1].numel(), \
+                #     "vision embeddings mismatch input embeddings: " \
+                #     f"vision_mask shape={vision_mask.shape}; " \
+                #     f"vision_mask count={torch.count_nonzero(vision_mask)}; " \
+                #     f"vision_embeds shape={vision_embeds.shape}"
+                inputs_embeds = torch.masked_scatter(inputs_embeds, vision_mask.unsqueeze(-1),
+                                                     vision_embeds.to(inputs_embeds.dtype).view(-1,
+                                                                                                vision_embeds.size(-1)))
+            except:
+                inputs_embeds = inputs_embeds + torch.sum(vision_embeds_[0, 0, :]) * 0.0
+        return inputs_embeds
+    def pixel_shuffle(self, x, scale_factor=0.5):
+        x = x.contiguous()
+        n, w, h, c = x.size()
+        # N, W, H, C --> N, W, H * scale, C // scale
+        x = x.view(n, w, int(h * scale_factor), int(c / scale_factor))
+        # N, W, H * scale, C // scale --> N, H * scale, W, C // scale
+        x = x.permute(0, 2, 1, 3).contiguous()
+        # N, H * scale, W, C // scale --> N, H * scale, W * scale, C // (scale ** 2)
+        x = x.view(n, int(h * scale_factor), int(w * scale_factor),
+                   int(c / (scale_factor * scale_factor)))
+        x = x.permute(0, 2, 1, 3).contiguous()
+        return x
+    def forward(self,
+                input_ids: torch.LongTensor = None,
+                attention_mask: Optional[torch.BoolTensor] = None,
+                position_ids: Optional[torch.LongTensor] = None,
+                pixel_values: Optional[torch.Tensor] = None,
+                past_key_values: Optional[List[torch.FloatTensor]] = None,
+                labels: Optional[torch.LongTensor] = None,
+                return_dict: bool = True,
+                **kwargs,
+                ):
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        # ========Get visual embedding========
+        if pixel_values is not None:
+            vision_embeds = self.forward_vision(pixel_values)
+        else:
+            vision_embeds = None
+        # print(f"input_ids.shape: {input_ids.shape}", {vision_embeds.shape})
+        inputs_embeds = self.prepare_for_lm(input_ids, vision_embeds)
+        # print(f"inputs_embeds.shape: {inputs_embeds.shape}")
+        p_mask = None
+        answer_length = None
+        if self.is_gradient_checkpointing and torch.is_grad_enabled():
+            inputs_embeds.requires_grad_(True)
+        # ========Forward into LM========
+        outputs = self.language_model(
+            input_ids=None,
+            inputs_embeds=inputs_embeds,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            return_dict=return_dict,
+            labels=labels,
+            use_cache=False,
+            conversation_ids=None,
+            replacement_noise_mode=self.replacement_noise_mode,
+            p_mask = p_mask,
+            answer_length = answer_length,
+            **kwargs,
+        )
+        return outputs
+    def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
+        super().gradient_checkpointing_enable(gradient_checkpointing_kwargs)
+        self.language_model.enable_input_require_grads()
+    def get_input_embeddings(self):
+        return self.language_model.get_input_embeddings()
+    def set_input_embeddings(self, value):
+        self.language_model.set_input_embeddings(value)
+    def get_output_embeddings(self):
+        return self.language_model.get_output_embeddings()
+    def set_output_embeddings(self, new_embeddings):
+        self.language_model.set_output_embeddings(new_embeddings)
+    def set_decoder(self, decoder):
+        self.language_model.set_decoder(decoder)
+    def get_decoder(self):
+        return self.language_model.get_decoder()
+    def tie_weights(self):
+        return self.language_model.tie_weights()
+    @torch.no_grad()
+    def generate(
+            self,
+            pixel_values: Optional[torch.FloatTensor] = None,
+            input_ids: Optional[torch.FloatTensor] = None,
+            **generate_kwargs,
+    ) -> torch.LongTensor:
+        if pixel_values is not None:
+            vision_embeds = self.forward_vision(pixel_values)
+        else:
+            vision_embeds = None
+        inputs_embeds = self.prepare_for_lm(input_ids, vision_embeds)
+        if 'llada' in self.config.language_model_config.name_or_path.lower():
+            outputs = self.language_model.generate_with_embeds(
+                inputs_embeds=inputs_embeds, **generate_kwargs
+            )
+        else:
+            raise NotImplementedError(f"Generation not implemented for model: {self.config.language_model_config.name_or_path}")
+        return outputs
+    @torch.no_grad()
+    def generate_replace_noise(
+            self,
+            pixel_values: Optional[torch.FloatTensor] = None,
+            input_ids: Optional[torch.FloatTensor] = None,
+            **generate_kwargs,
+    ) -> torch.LongTensor:
+        if pixel_values is not None:
+            vision_embeds = self.forward_vision(pixel_values)
+        else:
+            vision_embeds = None
+        inputs_embeds = self.prepare_for_lm(input_ids, vision_embeds)
+        outputs, all_steps_response = self.language_model.generate_with_embeds_replace_noise(
+            inputs_embeds=inputs_embeds, **generate_kwargs
+        )
+        return outputs, all_steps_response
+    def get_template(self):
+        template = dict(
+            SYSTEM=("<|start_header_id|>system<|end_header_id|>\n{system}<|eot_id|>\n"),
+            INSTRUCTION=("<|start_header_id|>user<|end_header_id|>\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n"),
+            SUFFIX="<|eot_id|>",
+            SUFFIX_AS_EOS=True,
+            SEP="\n",
+            STOP_WORDS=["<|eot_id|>"],
+        )
+        return template
+    @torch.no_grad()
+    def chat(
+            self,
+            tokenizer,
+            pixel_values,
+            question,
+            generation_config,
+            history=None,
+            return_history=False,
+            num_patches_list=None,
+            IMG_START_TOKEN='<img>',
+            IMG_END_TOKEN='</img>',
+            IMG_CONTEXT_TOKEN='<IMG_CONTEXT>',
+            verbose=False
+    ):
+        if history is None and pixel_values is not None and '<image>' not in question:
+            question = '<image>\n' + question
+        if num_patches_list is None:
+            num_patches_list = [pixel_values.shape[0]] if pixel_values is not None else []
+        assert pixel_values is None or len(pixel_values) == sum(num_patches_list)
+        img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
+        self.img_context_token_id = img_context_token_id
+        template = self.get_template()
+        eos_token_id = tokenizer.convert_tokens_to_ids(template["SUFFIX"].strip())
+        history = "" if history is None else history
+        prompt = history
+        prompt = prompt + template["INSTRUCTION"].format(input=question)
+        if verbose and pixel_values is not None:
+            image_bs = pixel_values.shape[0]
+            print(f'dynamic ViT batch size: {image_bs}')
+        prompt = prompt[::-1]
+        for num_patches in num_patches_list[::-1]:
+            image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
+            prompt = prompt.replace('<image>'[::-1], image_tokens[::-1], 1)
+        prompt = prompt[::-1]
+        model_inputs = tokenizer(prompt, return_tensors='pt')
+        device = torch.device(self.language_model.device if torch.cuda.is_available() else 'cpu')
+        input_ids = model_inputs['input_ids'].to(device)
+        attention_mask = model_inputs['attention_mask'].to(device)
+        generation_config['eos_token_id'] = eos_token_id
+        generation_output = self.generate(
+            pixel_values=pixel_values,
+            input_ids=input_ids,
+            **generation_config
+        )
+        # response = [
+        #     tokenizer.decode(g[len(p) :].tolist())
+        #     for p, g in zip(input_ids, generation_output)
+        # ][0]
+        #print("generation_output:", tokenizer.batch_decode(generation_output, skip_special_tokens=False)[0])
+        response = tokenizer.batch_decode(generation_output, skip_special_tokens=False)[0]
+        history = history + prompt + response
+        response = response.split(template["SUFFIX"].strip())[0].strip()
+        if return_history:
+            return response, history
+        else:
+            if verbose:
+                print(response)
+            return response
+        return
+    @torch.no_grad()
+    def chat_replace_noise(
+            self,
+            tokenizer,
+            pixel_values,
+            question,
+            generation_config,
+            history=None,
+            return_history=False,
+            num_patches_list=None,
+            IMG_START_TOKEN='<img>',
+            IMG_END_TOKEN='</img>',
+            IMG_CONTEXT_TOKEN='<IMG_CONTEXT>',
+            verbose=False
+    ):
+        if history is None and pixel_values is not None and '<image>' not in question:
+            question = '<image>\n' + question
+        if num_patches_list is None:
+            num_patches_list = [pixel_values.shape[0]] if pixel_values is not None else []
+        assert pixel_values is None or len(pixel_values) == sum(num_patches_list)
+        img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
+        self.img_context_token_id = img_context_token_id
+        template = self.get_template()
+        eos_token_id = tokenizer.convert_tokens_to_ids(template["SUFFIX"].strip())
+        history = "" if history is None else history
+        prompt = history
+        prompt = prompt + template["INSTRUCTION"].format(input=question)
+        if verbose and pixel_values is not None:
+            image_bs = pixel_values.shape[0]
+            print(f'dynamic ViT batch size: {image_bs}')
+        prompt = prompt[::-1]
+        for num_patches in num_patches_list[::-1]:
+            image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
+            prompt = prompt.replace('<image>'[::-1], image_tokens[::-1], 1)
+        prompt = prompt[::-1]
+        model_inputs = tokenizer(prompt, return_tensors='pt')
+        device = torch.device(self.language_model.device if torch.cuda.is_available() else 'cpu')
+        input_ids = model_inputs['input_ids'].to(device)
+        attention_mask = model_inputs['attention_mask'].to(device)
+        generation_config['eos_token_id'] = eos_token_id
+        generation_output, all_steps_response = self.generate_replace_noise(
+            pixel_values=pixel_values,
+            input_ids=input_ids,
+            **generation_config
+        )
+        response = tokenizer.batch_decode(generation_output, skip_special_tokens=False)[0]
+        all_steps_response_ = []
+        for step_response in all_steps_response:
+            step_response = tokenizer.batch_decode(step_response, skip_special_tokens=False)[0]
+            all_steps_response_.append(step_response)
+        all_steps_response = all_steps_response_
+        for i, step_response in enumerate(all_steps_response):
+            print(f"Step {i}: {step_response}\n")
+        history = history + prompt + response
+        response = response.split(template["SUFFIX"].strip())[0].strip()
+        if return_history:
+            return response, history
+        else:
+            if verbose:
+                print(response)
+            return response
+        return
+AutoConfig.register("dmllm", DMLLMConfig)
+AutoModel.register(DMLLMConfig, DMLLM)

modeling_llada.py ADDED Viewed

The diff for this file is too large to render. See raw diff

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "auto_map": {
+    "AutoProcessor": "processing_dmllm.DMLLMProcessor"
+  },
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "SiglipImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "processor_class": "DMLLMProcessor",
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 512,
+    "width": 512
+  }
+}

processing_dmllm.py ADDED Viewed

	@@ -0,0 +1,401 @@

+import math
+import torch
+import warnings
+import PIL.Image
+from torch.nn import functional as F
+from collections import UserDict, OrderedDict
+from typing import Union, Optional, Tuple, List, Dict, Any
+from transformers.image_utils import load_image
+from transformers.feature_extraction_utils import BatchFeature
+from .chat_template_utils import render_jinja_template
+from transformers.processing_utils import ProcessorMixin, AllKwargsForChatTemplate
+class DMLLMProcessor(ProcessorMixin):
+    attributes = ["tokenizer", "image_processor"]
+    optional_attributes = ['chat_template']
+    model_input_names = ['input_ids', 'attention_mask', 'pixel_values']
+    image_processor_class = "AutoImageProcessor"
+    tokenizer_class = "AutoTokenizer"
+    def __init__(
+            self, tokenizer, image_processor, chat_template=None,
+            image_size=512,
+            patch_size=16,
+            downsample_ratio=0.5,
+            max_sub_img=6,
+            min_sub_img=1,
+            image_token='<IMG_CONTEXT>',
+            image_start_token='<img>',
+            image_end_token='</img>',
+            special_tokens=['<IMG_CONTEXT>', '<img>', '</img>'], #'<think>', '</think>'
+            **kwargs):
+        if chat_template is None:
+            chat_template = "{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|start_header_id|>system<|end_header_id|>\nYou are a helpful assistant.<|eot_id|>\n{% endif %}<|start_header_id|>{{ message['role'] }}<|end_header_id|>\n{% if message['role'] == 'assistant' %}{% generation %}{{ message['content'][0]['text'] }}<|eot_id|>{% endgeneration %}{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}<img><IMG_CONTEXT></img>{% elif content['type'] == 'video' or 'video' in content %}<video><VIDEO_CONTEXT></video>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|eot_id|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|start_header_id|>assistant<|end_header_id|>\n{% endif %}"
+        super().__init__(tokenizer=tokenizer, image_processor=image_processor, chat_template=chat_template)
+        if isinstance(image_size, list) or isinstance(image_size, tuple):
+            image_size = image_size[0]
+        self.num_image_token = int((image_size // patch_size) ** 2 * (downsample_ratio ** 2))
+        self.vision_token_share_pe = kwargs.get('vision_token_share_pe', True)
+        self.image_token_len = kwargs.pop('image_token_len', 256)
+        self.max_sub_img = max_sub_img
+        self.min_sub_img = min_sub_img
+        self.image_token = image_token
+        self.image_start_token = image_start_token
+        self.image_end_token = image_end_token
+        self.tokenizer.add_special_tokens({'additional_special_tokens': special_tokens}, replace_additional_special_tokens=False)
+        self.image_token_id = self.tokenizer.convert_tokens_to_ids(self.image_token)
+        self.image_start_token_id = self.tokenizer.convert_tokens_to_ids(self.image_start_token)
+        self.image_end_token_id = self.tokenizer.convert_tokens_to_ids(self.image_end_token)
+        if 'llada' in tokenizer.name_or_path.lower():
+            self._pad_token_id = self.tokenizer.convert_tokens_to_ids("<|eot_id|>")
+        elif 'dream' in tokenizer.name_or_path.lower():
+            self._pad_token_id = self.tokenizer.convert_tokens_to_ids("<|endoftext|>")
+        elif 'sdar' in tokenizer.name_or_path.lower():
+            self._pad_token_id = self.tokenizer.convert_tokens_to_ids("<|endoftext|>")
+        if isinstance(image_size, int):
+            image_size = (image_size, image_size)
+        else:
+            image_size = image_size
+        self.image_size = image_size
+        assert image_size[0] == image_size[1]
+    def apply_chat_template(self, conversation, chat_template = None, **kwargs) -> str:
+        if chat_template is None:
+            chat_template = self.chat_template
+        processed_kwargs = {
+            "mm_load_kwargs": {},
+            "template_kwargs": {},
+        }
+        # for kwarg_type in processed_kwargs:
+        #     for key in AllKwargsForChatTemplate.__annotations__[kwarg_type].__annotations__.keys():
+        #         kwarg_type_defaults = AllKwargsForChatTemplate.__annotations__[kwarg_type]
+        #         default_value = getattr(kwarg_type_defaults, key, None)
+        #         value = kwargs.pop(key, default_value)
+        #         if value is not None and not isinstance(value, dict):
+        #             processed_kwargs[kwarg_type][key] = value
+        # Pass unprocessed custom kwargs
+        processed_kwargs["template_kwargs"].update(kwargs)
+        conversations = [conversation]
+        prompt, generation_indices = render_jinja_template(
+            conversations=conversations,
+            chat_template=chat_template,
+            **processed_kwargs["template_kwargs"],  # different flags such as `return_assistant_mask`
+            **self.tokenizer.special_tokens_map,  # tokenizer special tokens are used by some templates
+        )
+        return prompt, generation_indices
+    def __call__(self, text=None, images=[], videos=None, generation_indices=None, **kwargs) ->BatchFeature:
+        inputs = self.tokenizer(text, padding=False, truncation=False, return_attention_mask=False)
+        assistant_masks = []
+        input_ids = inputs["input_ids"]
+        for i in range(len(input_ids)):
+            current_mask = [0] * len(input_ids[i])
+            if 'dream' in self.tokenizer.name_or_path.lower():
+                # 基于 Dream 模型的标记来定位 assistant 部分
+                # 查找 <|im_start|>assistant 和 <|im_end|> 之间的内容
+                im_start_assistant_pattern = self.tokenizer.encode("<|im_start|>assistant\n", add_special_tokens=False)
+                im_end_pattern = self.tokenizer.encode("<|im_end|>", add_special_tokens=False)
+                # 在 input_ids 中查找 assistant 段落
+                j = 0
+                while j < len(input_ids[i]) - len(im_start_assistant_pattern) + 1:
+                    # 检查是否匹配 <|im_start|>assistant
+                    if input_ids[i][j:j+len(im_start_assistant_pattern)] == im_start_assistant_pattern:
+                        start_token = j + len(im_start_assistant_pattern)
+                        # 查找对应的 <|im_end|>
+                        end_token = None
+                        for k in range(start_token, len(input_ids[i]) - len(im_end_pattern) + 1):
+                            if input_ids[i][k:k+len(im_end_pattern)] == im_end_pattern:
+                                end_token = k
+                                break
+                        # 标记 assistant 部分
+                        if end_token is not None:
+                            for token_idx in range(start_token, end_token + len(im_end_pattern)):
+                                current_mask[token_idx] = 1
+                            j = end_token + len(im_end_pattern)
+                        else:
+                            j += 1
+                    else:
+                        j += 1
+            elif 'llada' in self.tokenizer.name_or_path.lower():
+                # Skip assistant mask computation if generation_indices is None/empty (e.g., GRPO prompt-only)
+                if generation_indices is not None and i < len(generation_indices) and generation_indices[i]:
+                    for assistant_start_char, assistant_end_char in generation_indices[i]:
+                        start_token = inputs.char_to_token(i, assistant_start_char)
+                        end_token = inputs.char_to_token(i, assistant_end_char - 1)
+                        if start_token is None:
+                            # start_token is out of bounds maybe due to truncation.
+                            break
+                        for token_id in range(start_token, end_token + 1 if end_token else len(input_ids[i])):
+                            current_mask[token_id] = 1
+            elif 'sdar' in self.tokenizer.name_or_path.lower():
+                # 为SDAR模型添加assistant识别逻辑
+                # SDAR使用 <|im_start|>assistant\n 和 <|im_end|> 格式
+                im_start_assistant_pattern = self.tokenizer.encode("<|im_start|>assistant\n", add_special_tokens=False)
+                im_end_pattern = self.tokenizer.encode("<|im_end|>", add_special_tokens=False)
+                # 在 input_ids 中查找 assistant 段落
+                j = 0
+                while j < len(input_ids[i]) - len(im_start_assistant_pattern) + 1:
+                    # 检查是否匹配 <|im_start|>assistant
+                    if input_ids[i][j:j+len(im_start_assistant_pattern)] == im_start_assistant_pattern:
+                        start_token = j + len(im_start_assistant_pattern)
+                        # 查找对应的 <|im_end|>
+                        end_token = None
+                        for k in range(start_token, len(input_ids[i]) - len(im_end_pattern) + 1):
+                            if input_ids[i][k:k+len(im_end_pattern)] == im_end_pattern:
+                                end_token = k
+                                break
+                        # 标记 assistant 部分（不包括结束token）
+                        if end_token is not None:
+                            for token_idx in range(start_token, end_token):
+                                current_mask[token_idx] = 1
+                            j = end_token + len(im_end_pattern)
+                        else:
+                            j += 1
+                    else:
+                        j += 1
+            assistant_masks.append(current_mask)
+        inputs["assistant_masks"] = assistant_masks[0]
+        inputs['input_ids'] = input_ids[0]
+        truncation = kwargs.pop('truncation', False)
+        max_length = kwargs.pop('max_length', 1024)
+        padding = kwargs.pop('padding', False)
+        inputs = self.process_images(images, inputs=inputs)
+        if isinstance(inputs, UserDict):
+            inputs = inputs.data
+        if 'attention_mask' not in inputs:
+            inputs['attention_mask'] = [1] * len(inputs['input_ids'])
+        if 'assistant_masks' in inputs:
+            inputs['prompt_mask'] = [1-x for x in inputs.pop('assistant_masks')]
+        inputs = self.process_inputs(inputs)
+        if truncation and len(inputs['input_ids']) > max_length:
+            inputs = self.truncate(inputs, max_length)
+        if padding and len(inputs['input_ids']) < max_length:
+            inputs = self.padding(inputs, max_length)
+        inputs = self.to_tensor(inputs)
+        self.check(inputs)
+        if self.vision_token_share_pe:
+            position_ids = self.get_position_ids(inputs)
+            position_ids = torch.tensor([position_ids], dtype=torch.long)
+            inputs['position_ids'] = position_ids
+        inputs.pop('sub_image_nums', None)
+        return BatchFeature(inputs)
+    def get_position_ids(self, inputs: Dict[str, Any]):
+        input_ids = inputs['input_ids'][0]
+        image_token_lens = self.get_image_token_length(inputs)
+        position_ids = []
+        i, j = 0, 0
+        while len(position_ids) < len(input_ids):
+            if input_ids[len(position_ids)] == self.image_token_id:
+                image_token_len = image_token_lens[j]
+                assert image_token_len % self.image_token_len == 0
+                num_views = image_token_len // self.image_token_len
+                for _ in range(num_views):
+                    position_ids += [i] * self.image_token_len # 同一个图像的所有 token 共享相同的位置编码
+                    i += 1
+                j += 1
+            else:
+                position_ids.append(i)
+                i += 1
+        assert j == len(image_token_lens) and len(position_ids) == len(input_ids), \
+            f"Wrong position_ids, {j} != {len(image_token_lens)} or {len(position_ids)} != {len(input_ids)}"
+        return position_ids
+    def process_images(self, images, inputs):
+        images = [load_image(img) for img in images]
+        if len(images) > 0:
+            processed_images = []
+            sub_image_nums = []
+            for image in images:
+                if len(images) > 1:
+                    # for multi images, remove the split strategy
+                    sub_images = dynamic_preprocess(
+                        image, min_num=1,
+                        max_num=1,
+                        image_size=self.image_size[0], use_thumbnail=True)
+                else:
+                    sub_images = dynamic_preprocess(
+                        image, min_num=self.min_sub_img,
+                        max_num=self.max_sub_img,
+                        image_size=self.image_size[0], use_thumbnail=True)
+                sub_image_nums.append(len(sub_images))
+                processed_images += sub_images
+            # print([_img.size for _img in processed_images])
+            pixel_values = self.image_processor.preprocess(
+                images=processed_images, return_tensors="pt"
+            )["pixel_values"] # (N, c, h, w)
+        else:
+            pixel_values = torch.zeros((
+                1, 3, self.image_size[0], self.image_size[1]), dtype=torch.float32
+            )
+            sub_image_nums = []
+        inputs['pixel_values'] = pixel_values
+        inputs['sub_image_nums'] = sub_image_nums
+        return inputs
+    def truncate(self, inputs: Dict[str, Any], max_length: int):
+        assert self.image_token_id not in inputs['input_ids'][max_length:], f"Truncate image token is not allowed."
+        inputs['input_ids'] = inputs['input_ids'][:max_length]
+        inputs['attention_mask'] = inputs['attention_mask'][:max_length]
+        if 'prompt_mask' in inputs:
+            inputs['prompt_mask'] = inputs['prompt_mask'][:max_length]
+        return inputs
+    def get_image_token_length(self, inputs: Dict[str, Any]) -> List[int]:
+        sub_image_nums = inputs.get('sub_image_nums', None)
+        if sub_image_nums is None or len(sub_image_nums) == 0:
+            return []
+        image_token_lens = [_num * self.num_image_token for _num in sub_image_nums]
+        return image_token_lens
+    def process_inputs(self, inputs: Dict[str, Any]):
+        graft_token_lens = self._get_graft_token_length(inputs)
+        inputs['input_ids'] = self._graft_token(inputs['input_ids'], graft_token_lens, self.image_token_id)
+        inputs['attention_mask'] = self._graft_token(inputs['attention_mask'], graft_token_lens, 'replicate')
+        if 'prompt_mask' in inputs:
+            inputs['prompt_mask'] = self._graft_token(inputs['prompt_mask'], graft_token_lens, 'replicate')
+        return inputs
+    def _graft_token(self, seq, graft_token_lens, value):
+        if value == 'replicate':
+            for i in reversed(graft_token_lens.keys()):
+                seq[i:] = [seq[i]] * graft_token_lens[i] + seq[i+1:]
+        else:
+            for i in reversed(graft_token_lens.keys()):
+                seq[i:] = [value] * graft_token_lens[i] + seq[i+1:]
+        return seq
+    def _get_graft_token_length(self, inputs: Dict[str, Any]) -> Dict[int, int]:
+        image_token_pos = [i for i, x in enumerate(inputs['input_ids']) if x == self.image_token_id]
+        image_token_lens = self.get_image_token_length(inputs)
+        assert len(image_token_pos) == len(image_token_lens), \
+            "Wrong image token count, " \
+            f"image_token_count({len(image_token_pos)}) != image_count({len(image_token_lens)})"
+        graft_token_lens = OrderedDict(item for item in zip(image_token_pos, image_token_lens))
+        return graft_token_lens
+    def check(self, inputs: Dict[str, Any]):
+        image_embed_token_count = torch.count_nonzero(inputs['input_ids'] == self.image_token_id).item()
+        image_embed_count = sum(self.get_image_token_length(inputs))
+        assert image_embed_token_count == image_embed_count, \
+            "Wrong image embed token count, " \
+            f"image_embed_token_count({image_embed_token_count}) != image_embed_count({image_embed_count})"
+    def padding(self, inputs: Dict[str, Any], max_length: int):
+        padding_len = max_length - len(inputs['input_ids'])
+        inputs['input_ids'] += [self.pad_token_id] * padding_len
+        inputs['attention_mask'] += [0] * padding_len
+        if 'prompt_mask' in inputs:
+            inputs['prompt_mask'] += [0] * padding_len
+        return inputs
+    def decode(self, token_ids: Union[List[int], torch.Tensor], **kwargs):
+        if isinstance(token_ids, torch.Tensor):
+            token_ids = token_ids.tolist()
+        text = self.tokenizer.decode(token_ids, **kwargs)
+        return text
+    def batch_decode(self, sequences: Union[List[List[int]], torch.Tensor], **kwargs):
+        if isinstance(sequences, torch.Tensor):
+            sequences = sequences.tolist()
+        texts = self.tokenizer.batch_decode(sequences, **kwargs)
+        return texts
+    def to_tensor(self, inputs):
+        inputs['input_ids'] = torch.tensor([inputs['input_ids']], dtype=torch.long)
+        inputs['attention_mask'] = torch.tensor([inputs['attention_mask']], dtype=torch.bool)
+        if 'prompt_mask' in inputs:
+            inputs['prompt_mask'] = torch.tensor([inputs['prompt_mask']], dtype=torch.bool)
+        return inputs
+    @property
+    def pad_token_id(self):
+        return self._pad_token_id
+    def __repr__(self):
+        pass
+    def __str__(self):
+        return 'DMLLMProcessor'
+def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
+    best_ratio_diff = float('inf')
+    best_ratio = (1, 1)
+    area = width * height
+    for ratio in target_ratios:
+        target_aspect_ratio = ratio[0] / ratio[1]
+        ratio_diff = abs(aspect_ratio - target_aspect_ratio)
+        if ratio_diff < best_ratio_diff:
+            best_ratio_diff = ratio_diff
+            best_ratio = ratio
+        elif ratio_diff == best_ratio_diff:
+            if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
+                best_ratio = ratio
+    # print(f'width: {width}, height: {height}, best_ratio: {best_ratio}')
+    return best_ratio
+def dynamic_preprocess(image, min_num=1, max_num=6, image_size=512, use_thumbnail=True):
+    orig_width, orig_height = image.size
+    aspect_ratio = orig_width / orig_height
+    # calculate the existing image aspect ratio
+    target_ratios = set(
+        (i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
+        i * j <= max_num and i * j >= min_num)
+    target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])
+    # find the closest aspect ratio to the target
+    target_aspect_ratio = find_closest_aspect_ratio(
+        aspect_ratio, target_ratios, orig_width, orig_height, image_size)
+    # calculate the target width and height
+    target_width = image_size * target_aspect_ratio[0]
+    target_height = image_size * target_aspect_ratio[1]
+    blocks = target_aspect_ratio[0] * target_aspect_ratio[1]
+    # resize the image
+    resized_img = image.resize((target_width, target_height))
+    processed_images = []
+    for i in range(blocks):
+        box = (
+            (i % (target_width // image_size)) * image_size,
+            (i // (target_width // image_size)) * image_size,
+            ((i % (target_width // image_size)) + 1) * image_size,
+            ((i // (target_width // image_size)) + 1) * image_size
+        )
+        # split the image
+        split_img = resized_img.crop(box)
+        processed_images.append(split_img)
+    assert len(processed_images) == blocks
+    if use_thumbnail and len(processed_images) != 1:
+        thumbnail_img = image.resize((image_size, image_size))
+        processed_images.append(thumbnail_img)
+    return processed_images

processor_config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "auto_map": {
+    "AutoProcessor": "processing_dmllm.DMLLMProcessor"
+  },
+  "image_end_token": "</img>",
+  "image_size": [
+    512,
+    512
+  ],
+  "image_start_token": "<img>",
+  "image_token": "<IMG_CONTEXT>",
+  "max_sub_img": 6,
+  "min_sub_img": 1,
+  "processor_class": "DMLLMProcessor"
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,60 @@

+{
+  "additional_special_tokens": [
+    "<|mdm_mask|>",
+    "<role>",
+    "</role>",
+    "<|arithmetic_start|>",
+    "<|arithmetic_end|>",
+    "<|number_start|>",
+    "<|number_end|>",
+    {
+      "content": "<IMG_CONTEXT>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<img>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "</img>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,2215 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "126080": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126081": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126082": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126083": {
+      "content": "[gMASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126084": {
+      "content": "<|reserved_token_0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126085": {
+      "content": "<|reserved_token_1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126086": {
+      "content": "<|reserved_token_2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126087": {
+      "content": "<|reserved_token_3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126088": {
+      "content": "<|reserved_token_4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126089": {
+      "content": "<|reserved_token_5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126090": {
+      "content": "<|reserved_token_6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126091": {
+      "content": "<|reserved_token_7|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126092": {
+      "content": "<|reserved_token_8|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126093": {
+      "content": "<|reserved_token_9|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126094": {
+      "content": "<|reserved_token_10|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126095": {
+      "content": "<|reserved_token_11|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126096": {
+      "content": "<|reserved_token_12|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126097": {
+      "content": "<|reserved_token_13|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126098": {
+      "content": "<|reserved_token_14|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126099": {
+      "content": "<|reserved_token_15|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126100": {
+      "content": "<|reserved_token_16|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126101": {
+      "content": "<|reserved_token_17|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126102": {
+      "content": "<|reserved_token_18|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126103": {
+      "content": "<|reserved_token_19|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126104": {
+      "content": "<|reserved_token_20|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126105": {
+      "content": "<|reserved_token_21|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126106": {
+      "content": "<|reserved_token_22|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126107": {
+      "content": "<|reserved_token_23|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126108": {
+      "content": "<|reserved_token_24|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126109": {
+      "content": "<|reserved_token_25|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126110": {
+      "content": "<|reserved_token_26|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126111": {
+      "content": "<|reserved_token_27|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126112": {
+      "content": "<|reserved_token_28|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126113": {
+      "content": "<|reserved_token_29|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126114": {
+      "content": "<|reserved_token_30|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126115": {
+      "content": "<|reserved_token_31|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126116": {
+      "content": "<|reserved_token_32|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126117": {
+      "content": "<|reserved_token_33|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126118": {
+      "content": "<|reserved_token_34|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126119": {
+      "content": "<|reserved_token_35|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126120": {
+      "content": "<|reserved_token_36|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126121": {
+      "content": "<|reserved_token_37|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126122": {
+      "content": "<|reserved_token_38|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126123": {
+      "content": "<|reserved_token_39|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126124": {
+      "content": "<|reserved_token_40|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126125": {
+      "content": "<|reserved_token_41|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126126": {
+      "content": "<|reserved_token_42|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126127": {
+      "content": "<|reserved_token_43|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126128": {
+      "content": "<|reserved_token_44|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126129": {
+      "content": "<|reserved_token_45|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126130": {
+      "content": "<|reserved_token_46|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126131": {
+      "content": "<|reserved_token_47|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126132": {
+      "content": "<|reserved_token_48|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126133": {
+      "content": "<|reserved_token_49|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126134": {
+      "content": "<|reserved_token_50|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126135": {
+      "content": "<|reserved_token_51|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126136": {
+      "content": "<|reserved_token_52|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126137": {
+      "content": "<|reserved_token_53|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126138": {
+      "content": "<|reserved_token_54|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126139": {
+      "content": "<|reserved_token_55|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126140": {
+      "content": "<|reserved_token_56|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126141": {
+      "content": "<|reserved_token_57|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126142": {
+      "content": "<|reserved_token_58|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126143": {
+      "content": "<|reserved_token_59|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126144": {
+      "content": "<|reserved_token_60|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126145": {
+      "content": "<|reserved_token_61|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126146": {
+      "content": "<|reserved_token_62|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126147": {
+      "content": "<|reserved_token_63|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126148": {
+      "content": "<|reserved_token_64|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126149": {
+      "content": "<|reserved_token_65|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126150": {
+      "content": "<|reserved_token_66|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126151": {
+      "content": "<|reserved_token_67|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126152": {
+      "content": "<|reserved_token_68|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126153": {
+      "content": "<|reserved_token_69|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126154": {
+      "content": "<|reserved_token_70|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126155": {
+      "content": "<|reserved_token_71|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126156": {
+      "content": "<|reserved_token_72|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126157": {
+      "content": "<|reserved_token_73|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126158": {
+      "content": "<|reserved_token_74|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126159": {
+      "content": "<|reserved_token_75|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126160": {
+      "content": "<|reserved_token_76|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126161": {
+      "content": "<|reserved_token_77|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126162": {
+      "content": "<|reserved_token_78|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126163": {
+      "content": "<|reserved_token_79|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126164": {
+      "content": "<|reserved_token_80|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126165": {
+      "content": "<|reserved_token_81|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126166": {
+      "content": "<|reserved_token_82|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126167": {
+      "content": "<|reserved_token_83|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126168": {
+      "content": "<|reserved_token_84|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126169": {
+      "content": "<|reserved_token_85|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126170": {
+      "content": "<|reserved_token_86|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126171": {
+      "content": "<|reserved_token_87|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126172": {
+      "content": "<|reserved_token_88|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126173": {
+      "content": "<|reserved_token_89|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126174": {
+      "content": "<|reserved_token_90|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126175": {
+      "content": "<|reserved_token_91|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126176": {
+      "content": "<|reserved_token_92|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126177": {
+      "content": "<|reserved_token_93|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126178": {
+      "content": "<|reserved_token_94|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126179": {
+      "content": "<|reserved_token_95|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126180": {
+      "content": "<|reserved_token_96|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126181": {
+      "content": "<|reserved_token_97|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126182": {
+      "content": "<|reserved_token_98|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126183": {
+      "content": "<|reserved_token_99|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126184": {
+      "content": "<|reserved_token_100|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126185": {
+      "content": "<|reserved_token_101|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126186": {
+      "content": "<|reserved_token_102|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126187": {
+      "content": "<|reserved_token_103|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126188": {
+      "content": "<|reserved_token_104|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126189": {
+      "content": "<|reserved_token_105|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126190": {
+      "content": "<|reserved_token_106|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126191": {
+      "content": "<|reserved_token_107|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126192": {
+      "content": "<|reserved_token_108|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126193": {
+      "content": "<|reserved_token_109|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126194": {
+      "content": "<|reserved_token_110|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126195": {
+      "content": "<|reserved_token_111|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126196": {
+      "content": "<|reserved_token_112|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126197": {
+      "content": "<|reserved_token_113|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126198": {
+      "content": "<|reserved_token_114|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126199": {
+      "content": "<|reserved_token_115|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126200": {
+      "content": "<|reserved_token_116|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126201": {
+      "content": "<|reserved_token_117|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126202": {
+      "content": "<|reserved_token_118|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126203": {
+      "content": "<|reserved_token_119|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126204": {
+      "content": "<|reserved_token_120|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126205": {
+      "content": "<|reserved_token_121|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126206": {
+      "content": "<|reserved_token_122|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126207": {
+      "content": "<|reserved_token_123|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126208": {
+      "content": "<|reserved_token_124|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126209": {
+      "content": "<|reserved_token_125|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126210": {
+      "content": "<|reserved_token_126|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126211": {
+      "content": "<|reserved_token_127|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126212": {
+      "content": "<|reserved_token_128|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126213": {
+      "content": "<|reserved_token_129|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126214": {
+      "content": "<|reserved_token_130|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126215": {
+      "content": "<|reserved_token_131|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126216": {
+      "content": "<|reserved_token_132|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126217": {
+      "content": "<|reserved_token_133|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126218": {
+      "content": "<|reserved_token_134|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126219": {
+      "content": "<|reserved_token_135|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126220": {
+      "content": "<|reserved_token_136|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126221": {
+      "content": "<|reserved_token_137|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126222": {
+      "content": "<|reserved_token_138|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126223": {
+      "content": "<|reserved_token_139|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126224": {
+      "content": "<|reserved_token_140|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126225": {
+      "content": "<|reserved_token_141|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126226": {
+      "content": "<|reserved_token_142|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126227": {
+      "content": "<|reserved_token_143|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126228": {
+      "content": "<|reserved_token_144|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126229": {
+      "content": "<|reserved_token_145|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126230": {
+      "content": "<|reserved_token_146|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126231": {
+      "content": "<|reserved_token_147|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126232": {
+      "content": "<|reserved_token_148|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126233": {
+      "content": "<|reserved_token_149|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126234": {
+      "content": "<|reserved_token_150|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126235": {
+      "content": "<|reserved_token_151|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126236": {
+      "content": "<|reserved_token_152|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126237": {
+      "content": "<|reserved_token_153|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126238": {
+      "content": "<|reserved_token_154|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126239": {
+      "content": "<|reserved_token_155|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126240": {
+      "content": "<|reserved_token_156|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126241": {
+      "content": "<|reserved_token_157|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126242": {
+      "content": "<|reserved_token_158|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126243": {
+      "content": "<|reserved_token_159|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126244": {
+      "content": "<|reserved_token_160|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126245": {
+      "content": "<|reserved_token_161|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126246": {
+      "content": "<|reserved_token_162|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126247": {
+      "content": "<|reserved_token_163|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126248": {
+      "content": "<|reserved_token_164|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126249": {
+      "content": "<|reserved_token_165|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126250": {
+      "content": "<|reserved_token_166|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126251": {
+      "content": "<|reserved_token_167|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126252": {
+      "content": "<|reserved_token_168|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126253": {
+      "content": "<|reserved_token_169|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126254": {
+      "content": "<|reserved_token_170|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126255": {
+      "content": "<|reserved_token_171|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126256": {
+      "content": "<|reserved_token_172|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126257": {
+      "content": "<|reserved_token_173|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126258": {
+      "content": "<|reserved_token_174|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126259": {
+      "content": "<|reserved_token_175|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126260": {
+      "content": "<|reserved_token_176|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126261": {
+      "content": "<|reserved_token_177|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126262": {
+      "content": "<|reserved_token_178|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126263": {
+      "content": "<|reserved_token_179|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126264": {
+      "content": "<|reserved_token_180|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126265": {
+      "content": "<|reserved_token_181|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126266": {
+      "content": "<|reserved_token_182|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126267": {
+      "content": "<|reserved_token_183|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126268": {
+      "content": "<|reserved_token_184|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126269": {
+      "content": "<|reserved_token_185|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126270": {
+      "content": "<|reserved_token_186|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126271": {
+      "content": "<|reserved_token_187|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126272": {
+      "content": "<|reserved_token_188|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126273": {
+      "content": "<|reserved_token_189|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126274": {
+      "content": "<|reserved_token_190|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126275": {
+      "content": "<|reserved_token_191|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126276": {
+      "content": "<|reserved_token_192|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126277": {
+      "content": "<|reserved_token_193|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126278": {
+      "content": "<|reserved_token_194|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126279": {
+      "content": "<|reserved_token_195|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126280": {
+      "content": "<|reserved_token_196|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126281": {
+      "content": "<|reserved_token_197|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126282": {
+      "content": "<|reserved_token_198|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126283": {
+      "content": "<|reserved_token_199|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126284": {
+      "content": "<|reserved_token_200|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126285": {
+      "content": "<|reserved_token_201|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126286": {
+      "content": "<|reserved_token_202|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126287": {
+      "content": "<|reserved_token_203|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126288": {
+      "content": "<|reserved_token_204|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126289": {
+      "content": "<|reserved_token_205|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126290": {
+      "content": "<|reserved_token_206|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126291": {
+      "content": "<|reserved_token_207|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126292": {
+      "content": "<|reserved_token_208|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126293": {
+      "content": "<|reserved_token_209|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126294": {
+      "content": "<|reserved_token_210|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126295": {
+      "content": "<|reserved_token_211|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126296": {
+      "content": "<|reserved_token_212|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126297": {
+      "content": "<|reserved_token_213|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126298": {
+      "content": "<|reserved_token_214|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126299": {
+      "content": "<|reserved_token_215|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126300": {
+      "content": "<|reserved_token_216|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126301": {
+      "content": "<|reserved_token_217|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126302": {
+      "content": "<|reserved_token_218|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126303": {
+      "content": "<|reserved_token_219|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126304": {
+      "content": "<|reserved_token_220|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126305": {
+      "content": "<|reserved_token_221|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126306": {
+      "content": "<|reserved_token_222|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126307": {
+      "content": "<|reserved_token_223|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126308": {
+      "content": "<|reserved_token_224|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126309": {
+      "content": "<|reserved_token_225|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126310": {
+      "content": "<|reserved_token_226|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126311": {
+      "content": "<|reserved_token_227|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126312": {
+      "content": "<|reserved_token_228|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126313": {
+      "content": "<|reserved_token_229|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126314": {
+      "content": "<|reserved_token_230|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126315": {
+      "content": "<|reserved_token_231|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126316": {
+      "content": "<|reserved_token_232|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126317": {
+      "content": "<|reserved_token_233|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126318": {
+      "content": "<|reserved_token_234|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126319": {
+      "content": "<|reserved_token_235|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126320": {
+      "content": "<|reserved_token_236|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126321": {
+      "content": "<|reserved_token_237|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126322": {
+      "content": "<|reserved_token_238|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126323": {
+      "content": "<|reserved_token_239|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126324": {
+      "content": "<|reserved_token_240|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126325": {
+      "content": "<|reserved_token_241|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126326": {
+      "content": "<|reserved_token_242|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126327": {
+      "content": "<|reserved_token_243|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126328": {
+      "content": "<|reserved_token_244|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126329": {
+      "content": "<|reserved_token_245|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126330": {
+      "content": "<|reserved_token_246|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126331": {
+      "content": "<|reserved_token_247|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126332": {
+      "content": "<|reserved_token_248|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126333": {
+      "content": "<|reserved_token_249|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126334": {
+      "content": "<|reserved_token_250|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126335": {
+      "content": "<|reserved_token_251|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126336": {
+      "content": "<|mdm_mask|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126337": {
+      "content": "<|reserved_token_253|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126338": {
+      "content": "<|reserved_token_254|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126339": {
+      "content": "<|reserved_token_255|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126340": {
+      "content": "<role>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126341": {
+      "content": "</role>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126342": {
+      "content": "<|arithmetic_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126343": {
+      "content": "<|arithmetic_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126344": {
+      "content": "<|number_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126345": {
+      "content": "<|number_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126346": {
+      "content": "<|start_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126347": {
+      "content": "<|end_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126348": {
+      "content": "<|eot_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126349": {
+      "content": "<IMG_CONTEXT>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126350": {
+      "content": "<img>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126351": {
+      "content": "</img>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|mdm_mask|>",
+    "<role>",
+    "</role>",
+    "<|arithmetic_start|>",
+    "<|arithmetic_end|>",
+    "<|number_start|>",
+    "<|number_end|>",
+    "<IMG_CONTEXT>",
+    "<img>",
+    "</img>"
+  ],
+  "auto_map": {
+    "AutoProcessor": "processing_dmllm.DMLLMProcessor"
+  },
+  "bos_token": "<|startoftext|>",
+  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "fast_tokenizer": true,
+  "gmask_token": "[gMASK]",
+  "merges_file": null,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "DMLLMProcessor",
+  "tokenizer_class": "PreTrainedTokenizer",
+  "trust_remote_code": true
+}