zwpride commited on
Commit
f8643c8
·
verified ·
1 Parent(s): 6f1bd0a

Restore LoopCoder-V2 model files

Browse files
README.md CHANGED
@@ -1,3 +1,126 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - code
7
+ - code-generation
8
+ - code-reasoning
9
+ - agentic-coding
10
+ - tool-use
11
+ - instruction-tuned
12
+ - looped-transformer
13
+ - parallel-loop-transformer
14
+ - plt
15
  ---
16
+
17
+ # LoopCoder-V2
18
+
19
+ LoopCoder-v2 is a 7B instruction-tuned code model based on the Parallel Loop Transformer (PLT). The model studies test-time computation scaling through repeated application of shared Transformer blocks while keeping the parameter count fixed.
20
+
21
+ The released checkpoint is the two-loop PLT variant (`plt_num_loops=2`). In the accompanying paper, this setting gives the best gain-cost trade-off: the second loop provides most of the useful latent refinement, while additional loops show diminishing or unstable updates.
22
+
23
+ ## Highlights
24
+
25
+ - 7B dense PLT coder trained from scratch on 18T tokens of mixed text and code data.
26
+ - Instruction-tuned with a matched supervised fine-tuning recipe.
27
+ - Uses cross-loop position offsets and shared-KV gated sliding-window attention.
28
+ - Targets code generation, multilingual code, code reasoning, agentic software engineering, and tool-use workflows.
29
+ - Strongest loop-count setting in the paper: two loops, not more.
30
+
31
+ ## Model Details
32
+
33
+ | Item | Value |
34
+ | --- | --- |
35
+ | Architecture | `IQuestPLTCoderForCausalLM` |
36
+ | Parameters | Approximately 7B |
37
+ | Hidden size | 5120 |
38
+ | Layers | 14 shared layers |
39
+ | Attention heads | 40 |
40
+ | KV heads | 8 |
41
+ | Head dimension | 128 |
42
+ | Intermediate size | 27648 |
43
+ | Activation | SwiGLU |
44
+ | Normalization | RMSNorm, epsilon 1e-5 |
45
+ | Position embedding | RoPE, theta 500000 |
46
+ | Vocabulary size | 76800 |
47
+ | Max position embeddings | 131072 |
48
+ | Precision | bfloat16 |
49
+ | PLT loops | 2 |
50
+ | PLT window size | 64 |
51
+
52
+ ## Evaluation Summary
53
+
54
+ Selected results from the paper are shown below. All LoopCoder-v2 variants use the same 7B shared-parameter setup and matched training, tuning, and evaluation protocols.
55
+
56
+ | Model | HumanEval+ | MultiPL-E | BigCodeBench | LiveCodeBench | SWE-bench Verified | Multi-SWE | Terminal-Bench | Terminal-Bench 2.0 | Mind2Web | BFCL V3 | Avg. |
57
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
58
+ | Baseline, 1 loop | 81.1 | 69.5 | 40.1 | 27.4 | 43.0 | 14.0 | 26.3 | 11.2 | 35.3 | 32.2 | 38.0 |
59
+ | LoopCoder-v2, 2 loops | 84.1 | 73.9 | 46.1 | 35.4 | 64.4 | 31.0 | 34.2 | 21.0 | 34.5 | 40.1 | 46.5 |
60
+ | LoopCoder-v2, 3 loops | 75.0 | 69.8 | 43.3 | 28.6 | 27.6 | 11.0 | 30.0 | 12.2 | 35.1 | 36.3 | 36.9 |
61
+ | LoopCoder-v2, 4 loops | 76.8 | 67.3 | 40.8 | 24.5 | 22.4 | 9.3 | 26.3 | 9.0 | 41.4 | 39.5 | 34.3 |
62
+
63
+ The paper's main finding is that PLT loop-count scaling is non-monotonic. The two-loop model improves broadly over the one-loop baseline, including SWE-bench Verified from 43.0 to 64.4 and Multi-SWE from 14.0 to 31.0, while three or more loops regress on many tasks.
64
+
65
+ ## Usage
66
+
67
+ This checkpoint uses a custom PLT model architecture. Load it in an environment that provides support for `IQuestPLTCoderForCausalLM` and the custom tokenizer/configuration files in this repository.
68
+
69
+ ```python
70
+ import torch
71
+ from transformers import AutoModelForCausalLM, AutoTokenizer
72
+
73
+ repo_id = "Multilingual-Multimodal-NLP/LoopCoder-V2"
74
+
75
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
76
+ model = AutoModelForCausalLM.from_pretrained(
77
+ repo_id,
78
+ torch_dtype=torch.bfloat16,
79
+ device_map="auto",
80
+ trust_remote_code=True,
81
+ )
82
+
83
+ messages = [
84
+ {"role": "user", "content": "Write a Python function that checks whether a string is a palindrome."}
85
+ ]
86
+
87
+ inputs = tokenizer.apply_chat_template(
88
+ messages,
89
+ add_generation_prompt=True,
90
+ tokenize=True,
91
+ return_tensors="pt",
92
+ ).to(model.device)
93
+
94
+ outputs = model.generate(
95
+ inputs,
96
+ max_new_tokens=512,
97
+ do_sample=True,
98
+ temperature=0.6,
99
+ top_p=0.95,
100
+ top_k=20,
101
+ )
102
+
103
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
104
+ ```
105
+
106
+ ## Training Data
107
+
108
+ LoopCoder-v2 was trained from scratch on an internal deduplicated mixture of text and code totaling 18T tokens, balanced at a 1:1 text-to-code token ratio. The code portion spans more than 100 programming languages. The largest language shares reported in the paper include Java, Python, JavaScript, Markdown, TypeScript, C, C++, PHP, C#, and HTML.
109
+
110
+ ## Intended Use
111
+
112
+ LoopCoder-v2 is intended for code generation, code reasoning, repository-level software engineering assistance, and tool-use research. It is especially useful for studying how looped latent computation changes model behavior under fixed parameter count.
113
+
114
+ ## Limitations
115
+
116
+ LoopCoder-v2 can produce incorrect, insecure, or incomplete code and should not be used without review in production systems. The released model is optimized for coding and tool-use workloads; performance on unrelated open-domain tasks may vary. The paper also shows that increasing PLT loops beyond the two-loop setting can hurt performance, so this checkpoint should be treated as the recommended loop-count configuration rather than evidence that more loops are always better.
117
+
118
+ ## Citation
119
+
120
+ ```bibtex
121
+ @misc{loopcoder_v2_2026,
122
+ title = {LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling},
123
+ author = {Yang, Jian and Guo, Shawn and Zhang, Wei and Zheng, Tianyu and Du, Yaxin and Li, Haau-Sing and Wu, Jiajun and Song, Yue and Xing, Yan and Cai, Qingsong and Huang, Zelong and Hao, Chuan and Tao, Ran and Liu, Xianglong and Zhao, Wayne Xin and Tang, Mingjie and Lv, Weifeng and Zhou, Ming and Dai, Bryan},
124
+ year = {2026}
125
+ }
126
+ ```
added_tokens.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 75873,
3
+ "</tool_call>": 75877,
4
+ "</tool_response>": 75879,
5
+ "</tools>": 75875,
6
+ "<CLS>": 75858,
7
+ "<EOD>": 75860,
8
+ "<MASK>": 75861,
9
+ "<PAD>": 75862,
10
+ "<SEP>": 75859,
11
+ "<think>": 75872,
12
+ "<tool_call>": 75876,
13
+ "<tool_response>": 75878,
14
+ "<tools>": 75874,
15
+ "<|CLS|>": 75880,
16
+ "<|EOD|>": 75882,
17
+ "<|MASK|>": 75883,
18
+ "<|PAD|>": 75884,
19
+ "<|SEP|>": 75881,
20
+ "<|endoftext|>": 75869,
21
+ "<|file_sep|>": 75871,
22
+ "<|fim_middle|>": 75866,
23
+ "<|fim_pad|>": 75868,
24
+ "<|fim_prefix|>": 75865,
25
+ "<|fim_suffix|>": 75867,
26
+ "<|im_end|>": 75864,
27
+ "<|im_start|>": 75863,
28
+ "<|repo_name|>": 75870
29
+ }
chat_template.jinja ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% macro render_extra_keys(json_dict, handled_keys) %}
2
+ {%- if json_dict is mapping %}
3
+ {%- for json_key in json_dict if json_key not in handled_keys %}
4
+ {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
5
+ {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
6
+ {%- else %}
7
+ {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
8
+ {%- endif %}
9
+ {%- endfor %}
10
+ {%- endif %}
11
+ {% endmacro %}
12
+
13
+ {%- if messages[0]["role"] == "system" %}
14
+ {%- set system_message = messages[0]["content"] %}
15
+ {%- set loop_messages = messages[1:] %}
16
+ {%- else %}
17
+ {%- set loop_messages = messages %}
18
+ {%- endif %}
19
+
20
+ {%- if not tools is defined %}
21
+ {%- set tools = [] %}
22
+ {%- endif %}
23
+
24
+ {%- if system_message is defined %}
25
+ {{- "<|im_start|>system\n" + system_message }}
26
+ {%- else %}
27
+ {%- if tools is iterable and tools | length > 0 %}
28
+ {{- "<|im_start|>system\nYou are a helpful AI assistant." }}
29
+ {%- endif %}
30
+ {%- endif %}
31
+ {%- if tools is iterable and tools | length > 0 %}
32
+ {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
33
+ {{- "<tools>" }}
34
+ {%- for tool in tools %}
35
+ {%- if tool.function is defined %}
36
+ {%- set tool = tool.function %}
37
+ {%- endif %}
38
+ {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
39
+ {%- if tool.description is defined %}
40
+ {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
41
+ {%- endif %}
42
+ {{- '\n<parameters>' }}
43
+ {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
44
+ {%- for param_name, param_fields in tool.parameters.properties|items %}
45
+ {{- '\n<parameter>' }}
46
+ {{- '\n<name>' ~ param_name ~ '</name>' }}
47
+ {%- if param_fields.type is defined %}
48
+ {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
49
+ {%- endif %}
50
+ {%- if param_fields.description is defined %}
51
+ {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
52
+ {%- endif %}
53
+ {%- set handled_keys = ['name', 'type', 'description'] %}
54
+ {{- render_extra_keys(param_fields, handled_keys) }}
55
+ {{- '\n</parameter>' }}
56
+ {%- endfor %}
57
+ {%- endif %}
58
+ {% set handled_keys = ['type', 'properties'] %}
59
+ {{- render_extra_keys(tool.parameters, handled_keys) }}
60
+ {{- '\n</parameters>' }}
61
+ {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
62
+ {{- render_extra_keys(tool, handled_keys) }}
63
+ {{- '\n</function>' }}
64
+ {%- endfor %}
65
+ {{- "\n</tools>" }}
66
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
67
+ {%- endif %}
68
+ {%- if system_message is defined %}
69
+ {{- '<|im_end|>\n' }}
70
+ {%- else %}
71
+ {%- if tools is iterable and tools | length > 0 %}
72
+ {{- '<|im_end|>\n' }}
73
+ {%- endif %}
74
+ {%- endif %}
75
+ {%- for message in loop_messages %}
76
+ {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
77
+ {{- '<|im_start|>' + message.role }}
78
+ {%- if (message.content_mask is not defined) or not message.content_mask %}{%- generation %}
79
+ {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
80
+ {{- '\n' + message.content | trim + '\n' }}
81
+ {%- endif %}
82
+ {%- for tool_call in message.tool_calls %}
83
+ {%- if tool_call.function is defined %}
84
+ {%- set tool_call = tool_call.function %}
85
+ {%- endif %}
86
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
87
+ {%- if tool_call.arguments is defined %}
88
+ {%- for args_name, args_value in tool_call.arguments|items %}
89
+ {{- '<parameter=' + args_name + '>\n' }}
90
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
91
+ {{- args_value }}
92
+ {{- '\n</parameter>\n' }}
93
+ {%- endfor %}
94
+ {%- endif %}
95
+ {{- '</function>\n</tool_call>' }}
96
+ {%- endfor %}
97
+ {{- '<|im_end|>\n' }}
98
+ {%- endgeneration %}{%- else %}
99
+ {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
100
+ {{- '\n' + message.content | trim + '\n' }}
101
+ {%- endif %}
102
+ {%- for tool_call in message.tool_calls %}
103
+ {%- if tool_call.function is defined %}
104
+ {%- set tool_call = tool_call.function %}
105
+ {%- endif %}
106
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
107
+ {%- if tool_call.arguments is defined %}
108
+ {%- for args_name, args_value in tool_call.arguments|items %}
109
+ {{- '<parameter=' + args_name + '>\n' }}
110
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
111
+ {{- args_value }}
112
+ {{- '\n</parameter>\n' }}
113
+ {%- endfor %}
114
+ {%- endif %}
115
+ {{- '</function>\n</tool_call>' }}
116
+ {%- endfor %}
117
+ {{- '<|im_end|>\n' }}
118
+ {%- endif %}
119
+ {%- elif message.role == "user" or message.role == "system" %}
120
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
121
+ {%- elif message.role == "assistant" %}
122
+ {{- '<|im_start|>' + message.role + '\n' }}
123
+ {%- if (message.content_mask is not defined) or not message.content_mask %}{%- generation %}{{- message.content + '<|im_end|>' }}{%- endgeneration %}{%- else %}{{- message.content + '<|im_end|>' }}{%- endif %}
124
+ {{- '\n' }}
125
+ {%- elif message.role == "tool" %}
126
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
127
+ {{- '<|im_start|>user\n' }}
128
+ {%- endif %}
129
+ {{- '<tool_response>\n' }}
130
+ {{- message.content }}
131
+ {{- '\n</tool_response>\n' }}
132
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
133
+ {{- '<|im_end|>\n' }}
134
+ {%- elif loop.last %}
135
+ {{- '<|im_end|>\n' }}
136
+ {%- endif %}
137
+ {%- else %}
138
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
139
+ {%- endif %}
140
+ {%- endfor %}
141
+ {%- if add_generation_prompt %}
142
+ {{- '<|im_start|>assistant\n' }}
143
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "IQuestPLTCoderForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_iquestpltcoder.IQuestPLTCoderConfig"
9
+ },
10
+ "bos_token_id": 1,
11
+ "dtype": "bfloat16",
12
+ "eos_token_id": 2,
13
+ "head_dim": 128,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 5120,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 27648,
18
+ "max_position_embeddings": 131072,
19
+ "mlp_bias": false,
20
+ "model_type": "iquestpltcoder",
21
+ "num_attention_heads": 40,
22
+ "num_hidden_layers": 14,
23
+ "num_key_value_heads": 8,
24
+ "plt_emb_scale": 0.707,
25
+ "plt_gate_use_hidden_states": true,
26
+ "plt_hidden_scale": 0.053,
27
+ "plt_normalize_per_loop": true,
28
+ "plt_num_loops": 2,
29
+ "plt_window_size": [
30
+ 64,
31
+ 0
32
+ ],
33
+ "rms_norm_eps": 1e-05,
34
+ "rope_scaling": null,
35
+ "rope_theta": 500000,
36
+ "tie_word_embeddings": false,
37
+ "transformers_version": "4.57.1",
38
+ "use_cache": true,
39
+ "vocab_size": 76800
40
+ }
configuration_iquestcoder.py ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """IQuestCoder model configuration."""
2
+
3
+ from transformers.configuration_utils import PretrainedConfig
4
+ from transformers.utils import logging
5
+
6
+
7
+ logger = logging.get_logger(__name__)
8
+
9
+
10
+ class IQuestCoderConfig(PretrainedConfig):
11
+ r"""
12
+ This is the configuration class to store the configuration of a [`IQuestCoderModel`]. It is used to instantiate
13
+ an IQuestCoder model according to the specified arguments, defining the model architecture.
14
+
15
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
16
+ documentation from [`PretrainedConfig`] for more information.
17
+
18
+ Args:
19
+ vocab_size (`int`, *optional*, defaults to 76800):
20
+ Vocabulary size of the IQuestCoder model. Defines the number of different tokens that can be represented
21
+ by the `inputs_ids` passed when calling [`IQuestCoderModel`].
22
+ hidden_size (`int`, *optional*, defaults to 5120):
23
+ Dimension of the hidden representations.
24
+ intermediate_size (`int`, *optional*, defaults to 27648):
25
+ Dimension of the MLP representations.
26
+ num_hidden_layers (`int`, *optional*, defaults to 80):
27
+ Number of hidden layers in the Transformer decoder.
28
+ num_attention_heads (`int`, *optional*, defaults to 40):
29
+ Number of attention heads for each attention layer in the Transformer decoder.
30
+ num_key_value_heads (`int`, *optional*, defaults to 8):
31
+ This is the number of key_value heads that should be used to implement Grouped Query Attention (GQA).
32
+ If `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA).
33
+ If `num_key_value_heads=1`, the model will use Multi Query Attention (MQA).
34
+ head_dim (`int`, *optional*, defaults to 128):
35
+ The dimension of each attention head. If not specified, defaults to `hidden_size // num_attention_heads`.
36
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
37
+ The non-linear activation function (function or string) in the decoder.
38
+ max_position_embeddings (`int`, *optional*, defaults to 16384):
39
+ The maximum sequence length that this model might ever be used with.
40
+ initializer_range (`float`, *optional*, defaults to 0.02):
41
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
42
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
43
+ The epsilon used by the rms normalization layers.
44
+ use_cache (`bool`, *optional*, defaults to `True`):
45
+ Whether or not the model should return the last key/values attentions (not used by all models).
46
+ pad_token_id (`int`, *optional*):
47
+ Padding token id.
48
+ bos_token_id (`int`, *optional*, defaults to 1):
49
+ Beginning of stream token id.
50
+ eos_token_id (`int`, *optional*, defaults to 2):
51
+ End of stream token id.
52
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
53
+ Whether to tie weight embeddings.
54
+ rope_theta (`float`, *optional*, defaults to 500000.0):
55
+ The base period of the RoPE embeddings.
56
+ rope_scaling (`Dict`, *optional*):
57
+ Dictionary containing the scaling configuration for the RoPE embeddings. Supports various RoPE scaling
58
+ types including "linear", "dynamic", "yarn", "longrope", etc.
59
+ attention_bias (`bool`, *optional*, defaults to `False`):
60
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
61
+ attention_dropout (`float`, *optional*, defaults to 0.0):
62
+ The dropout ratio for the attention probabilities.
63
+ mlp_bias (`bool`, *optional*, defaults to `False`):
64
+ Whether to use a bias in up_proj, down_proj and gate_proj layers in the MLP layers.
65
+ clip_qkv (`float`, *optional*):
66
+ If set, clip the query, key, and value tensors to this value. Borrowed from OLMo for training stability.
67
+ use_sliding_window (`bool`, *optional*, defaults to `False`):
68
+ Whether to use sliding window attention. Borrowed from Qwen2.
69
+ sliding_window (`int`, *optional*):
70
+ The sliding window size. Only effective when `use_sliding_window=True`.
71
+ max_window_layers (`int`, *optional*, defaults to 0):
72
+ The number of layers that don't use sliding window attention. Borrowed from Qwen2.
73
+
74
+ Example:
75
+ ```python
76
+ >>> from configuration_iquestcoder import IQuestCoderConfig
77
+ >>> from modeling_iquestcoder import IQuestCoderModel
78
+
79
+ >>> # Initializing a IQuestCoder configuration
80
+ >>> configuration = IQuestCoderConfig()
81
+
82
+ >>> # Initializing a model from the configuration
83
+ >>> model = IQuestCoderModel(configuration)
84
+
85
+ >>> # Accessing the model configuration
86
+ >>> configuration = model.config
87
+ ```
88
+ """
89
+
90
+ model_type = "iquestcoder"
91
+ keys_to_ignore_at_inference = ["past_key_values"]
92
+
93
+ # Tensor / pipeline parallel plans for vLLM transformers backend.
94
+ # Same shape as LlamaConfig — IQuestCoder is structurally Llama (RMSNorm + GQA + RoPE + SwiGLU).
95
+ base_model_tp_plan = {
96
+ "layers.*.self_attn.q_proj": "colwise",
97
+ "layers.*.self_attn.k_proj": "colwise",
98
+ "layers.*.self_attn.v_proj": "colwise",
99
+ "layers.*.self_attn.o_proj": "rowwise",
100
+ "layers.*.mlp.gate_proj": "colwise",
101
+ "layers.*.mlp.up_proj": "colwise",
102
+ "layers.*.mlp.down_proj": "rowwise",
103
+ }
104
+ base_model_pp_plan = {
105
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
106
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
107
+ "norm": (["hidden_states"], ["hidden_states"]),
108
+ }
109
+
110
+ def __init__(
111
+ self,
112
+ vocab_size=76800,
113
+ hidden_size=5120,
114
+ intermediate_size=27648,
115
+ num_hidden_layers=80,
116
+ num_attention_heads=40,
117
+ num_key_value_heads=8,
118
+ head_dim=128,
119
+ hidden_act="silu",
120
+ max_position_embeddings=16384,
121
+ initializer_range=0.02,
122
+ rms_norm_eps=1e-5,
123
+ use_cache=True,
124
+ pad_token_id=None,
125
+ bos_token_id=1,
126
+ eos_token_id=2,
127
+ tie_word_embeddings=False,
128
+ rope_theta=500000.0,
129
+ rope_scaling=None,
130
+ attention_bias=False,
131
+ attention_dropout=0.0,
132
+ mlp_bias=False,
133
+ # IQuestCoder specific (borrowed from OLMo)
134
+ clip_qkv=None,
135
+ # IQuestCoder specific (borrowed from Qwen2)
136
+ use_sliding_window=False,
137
+ sliding_window=None,
138
+ max_window_layers=0,
139
+ **kwargs,
140
+ ):
141
+ self.vocab_size = vocab_size
142
+ self.max_position_embeddings = max_position_embeddings
143
+ self.hidden_size = hidden_size
144
+ self.intermediate_size = intermediate_size
145
+ self.num_hidden_layers = num_hidden_layers
146
+ self.num_attention_heads = num_attention_heads
147
+ self.num_key_value_heads = num_key_value_heads
148
+ self.head_dim = head_dim
149
+ self.hidden_act = hidden_act
150
+ self.initializer_range = initializer_range
151
+ self.rms_norm_eps = rms_norm_eps
152
+ self.use_cache = use_cache
153
+ self.rope_theta = rope_theta
154
+ self.rope_scaling = rope_scaling
155
+ self.attention_bias = attention_bias
156
+ self.attention_dropout = attention_dropout
157
+ self.mlp_bias = mlp_bias
158
+ # IQuestCoder specific
159
+ self.clip_qkv = clip_qkv
160
+ self.use_sliding_window = use_sliding_window
161
+ self.sliding_window = sliding_window
162
+ self.max_window_layers = max_window_layers
163
+
164
+ # Validate rope_scaling configuration
165
+ self._rope_scaling_validation()
166
+
167
+ super().__init__(
168
+ pad_token_id=pad_token_id,
169
+ bos_token_id=bos_token_id,
170
+ eos_token_id=eos_token_id,
171
+ tie_word_embeddings=tie_word_embeddings,
172
+ **kwargs,
173
+ )
174
+
175
+ def _rope_scaling_validation(self):
176
+ """Validate the `rope_scaling` configuration."""
177
+ if self.rope_scaling is None:
178
+ return
179
+
180
+ if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) < 1:
181
+ raise ValueError(
182
+ "`rope_scaling` must be a dictionary with a minimum of one field, `type` or `rope_type`."
183
+ )
184
+
185
+ rope_scaling_type = self.rope_scaling.get("type", None) or self.rope_scaling.get("rope_type", None)
186
+ if rope_scaling_type is None:
187
+ raise ValueError(
188
+ "`rope_scaling` must have a `type` or `rope_type` field."
189
+ )
190
+
191
+ valid_rope_types = ["linear", "dynamic", "yarn", "longrope", "llama3"]
192
+ if rope_scaling_type not in valid_rope_types:
193
+ raise ValueError(
194
+ f"`rope_scaling`'s type field must be one of {valid_rope_types}, got {rope_scaling_type}"
195
+ )
196
+
197
+
198
+ __all__ = ["IQuestCoderConfig"]
199
+
configuration_iquestpltcoder.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """IQuestPLTCoder model configuration.
2
+
3
+ Extends the IQuestCoder configuration with PLT (Parallel Loop Transformer)
4
+ specific parameters. PLT reuses the same physical transformer layers across
5
+ multiple loops, with cross-loop processing (CLP) and mixed attention (global
6
+ full-attention + local sliding-window attention gated per head) in loop 1+.
7
+
8
+ Reference: https://arxiv.org/abs/2510.24824
9
+ """
10
+
11
+ from typing import Dict, List, Optional, Union
12
+
13
+ from transformers.configuration_utils import PretrainedConfig
14
+ from transformers.utils import logging
15
+
16
+
17
+ logger = logging.get_logger(__name__)
18
+
19
+
20
+ class IQuestPLTCoderConfig(PretrainedConfig):
21
+ r"""
22
+ Configuration class for [`IQuestPLTCoderModel`].
23
+
24
+ This is a PLT (Parallel Loop Transformer) variant of IQuestCoder. The model
25
+ has `num_hidden_layers` physical transformer layers that are executed
26
+ `plt_num_loops` times. Weights are shared across loops; each loop adds
27
+ cross-loop processing and mixed attention via a learned per-head gate.
28
+
29
+ Args:
30
+ vocab_size (`int`, *optional*, defaults to 75904):
31
+ Vocabulary size of the model (padded to be divisible by 128).
32
+ hidden_size (`int`, *optional*, defaults to 5120):
33
+ Dimension of the hidden representations.
34
+ intermediate_size (`int`, *optional*, defaults to 27648):
35
+ Dimension of the MLP representations.
36
+ num_hidden_layers (`int`, *optional*, defaults to 14):
37
+ Number of physical transformer layers (shared across all loops).
38
+ num_attention_heads (`int`, *optional*, defaults to 40):
39
+ Number of attention heads for each attention layer.
40
+ num_key_value_heads (`int`, *optional*, defaults to 8):
41
+ Number of key_value heads for Grouped Query Attention (GQA).
42
+ head_dim (`int`, *optional*, defaults to 128):
43
+ The dimension of each attention head.
44
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
45
+ The non-linear activation function in the decoder (SwiGLU uses SiLU).
46
+ max_position_embeddings (`int`, *optional*, defaults to 131072):
47
+ The maximum sequence length that this model might ever be used with.
48
+ initializer_range (`float`, *optional*, defaults to 0.02):
49
+ The standard deviation of the truncated_normal_initializer for
50
+ initializing all weight matrices.
51
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
52
+ The epsilon used by the RMS normalization layers.
53
+ use_cache (`bool`, *optional*, defaults to `True`):
54
+ Whether the model should return the last key/values attentions.
55
+ pad_token_id (`int`, *optional*):
56
+ Padding token id.
57
+ bos_token_id (`int`, *optional*, defaults to 1):
58
+ Beginning of stream token id.
59
+ eos_token_id (`int` or `list`, *optional*, defaults to `[2, 75864, 75869]`):
60
+ End of stream token id(s).
61
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
62
+ Whether to tie input embedding and output projection weights.
63
+ rope_theta (`float`, *optional*, defaults to 500000.0):
64
+ The base period of the RoPE embeddings.
65
+ rope_scaling (`Dict`, *optional*):
66
+ Dictionary containing the scaling configuration for the RoPE
67
+ embeddings. Supports "linear", "dynamic", "yarn", "longrope", "llama3".
68
+ attention_bias (`bool`, *optional*, defaults to `False`):
69
+ Whether to use a bias in the Q, K, V and output projection layers.
70
+ attention_dropout (`float`, *optional*, defaults to 0.0):
71
+ The dropout ratio for the attention probabilities.
72
+ mlp_bias (`bool`, *optional*, defaults to `False`):
73
+ Whether to use a bias in the MLP gate/up/down projection layers.
74
+ plt_num_loops (`int`, *optional*, defaults to 2):
75
+ Number of times the physical transformer layers are executed.
76
+ Loop 0 runs standard causal attention and stores KV caches.
77
+ Loops 1+ run mixed attention with cross-loop processing.
78
+ plt_window_size (`list` of `int`, *optional*, defaults to `[64, 0]`):
79
+ Sliding window size `[left, right]` for the local attention in
80
+ loop 1+. `[64, 0]` means a left-context window of 64 tokens with
81
+ causal masking (right=0).
82
+ plt_normalize_per_loop (`bool`, *optional*, defaults to `True`):
83
+ When True, apply final_layernorm (shared weights) to hidden states
84
+ at the end of each non-last loop before cross-loop processing.
85
+ plt_emb_scale (`float`, *optional*, defaults to `None`):
86
+ Scaling factor for the embedding in CLP: `a * E + b * shift(H)`.
87
+ `None` means 1.0 (no scaling).
88
+ plt_hidden_scale (`float`, *optional*, defaults to `None`):
89
+ Scaling factor for the shifted hidden state in CLP:
90
+ `a * E + b * shift(H)`. `None` means 1.0 (no scaling).
91
+ plt_gate_use_hidden_states (`bool`, *optional*, defaults to `False`):
92
+ Gate input mode. When `False`, the gate is computed as
93
+ `sigmoid(einsum(Q, W_gate) + b_gate)` per head on the post-RoPE
94
+ query tensor. When `True`, gate uses
95
+ `sigmoid(Linear(RMSNorm(hidden_states)))` (OLMo-style) instead.
96
+
97
+ Example:
98
+ ```python
99
+ >>> from configuration_iquestpltcoder import IQuestPLTCoderConfig
100
+ >>> from modeling_iquestpltcoder import IQuestPLTCoderModel
101
+
102
+ >>> configuration = IQuestPLTCoderConfig()
103
+ >>> model = IQuestPLTCoderModel(configuration)
104
+ >>> configuration = model.config
105
+ ```
106
+ """
107
+
108
+ model_type = "iquestpltcoder"
109
+ keys_to_ignore_at_inference = ["past_key_values"]
110
+
111
+ def __init__(
112
+ self,
113
+ vocab_size=75904,
114
+ hidden_size=5120,
115
+ intermediate_size=27648,
116
+ num_hidden_layers=14,
117
+ num_attention_heads=40,
118
+ num_key_value_heads=8,
119
+ head_dim=128,
120
+ hidden_act="silu",
121
+ max_position_embeddings=131072,
122
+ initializer_range=0.02,
123
+ rms_norm_eps=1e-5,
124
+ use_cache=True,
125
+ pad_token_id=None,
126
+ bos_token_id=1,
127
+ eos_token_id=None,
128
+ tie_word_embeddings=False,
129
+ rope_theta=500000.0,
130
+ rope_scaling=None,
131
+ attention_bias=False,
132
+ attention_dropout=0.0,
133
+ mlp_bias=False,
134
+ # PLT specific
135
+ plt_num_loops=2,
136
+ plt_window_size=None,
137
+ plt_normalize_per_loop=True,
138
+ plt_emb_scale=None,
139
+ plt_hidden_scale=None,
140
+ plt_gate_use_hidden_states=False,
141
+ **kwargs,
142
+ ):
143
+ if eos_token_id is None:
144
+ eos_token_id = [2, 75864, 75869]
145
+ if plt_window_size is None:
146
+ plt_window_size = [64, 0]
147
+
148
+ self.vocab_size = vocab_size
149
+ self.max_position_embeddings = max_position_embeddings
150
+ self.hidden_size = hidden_size
151
+ self.intermediate_size = intermediate_size
152
+ self.num_hidden_layers = num_hidden_layers
153
+ self.num_attention_heads = num_attention_heads
154
+ self.num_key_value_heads = num_key_value_heads
155
+ self.head_dim = head_dim
156
+ self.hidden_act = hidden_act
157
+ self.initializer_range = initializer_range
158
+ self.rms_norm_eps = rms_norm_eps
159
+ self.use_cache = use_cache
160
+ self.rope_theta = rope_theta
161
+ self.rope_scaling = rope_scaling
162
+ self.attention_bias = attention_bias
163
+ self.attention_dropout = attention_dropout
164
+ self.mlp_bias = mlp_bias
165
+
166
+ # PLT specific
167
+ self.plt_num_loops = plt_num_loops
168
+ self.plt_window_size = plt_window_size
169
+ self.plt_normalize_per_loop = plt_normalize_per_loop
170
+ self.plt_emb_scale = plt_emb_scale
171
+ self.plt_hidden_scale = plt_hidden_scale
172
+ self.plt_gate_use_hidden_states = plt_gate_use_hidden_states
173
+
174
+ self._rope_scaling_validation()
175
+
176
+ super().__init__(
177
+ pad_token_id=pad_token_id,
178
+ bos_token_id=bos_token_id,
179
+ eos_token_id=eos_token_id,
180
+ tie_word_embeddings=tie_word_embeddings,
181
+ **kwargs,
182
+ )
183
+
184
+ def _rope_scaling_validation(self):
185
+ """Validate the `rope_scaling` configuration."""
186
+ if self.rope_scaling is None:
187
+ return
188
+
189
+ if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) < 1:
190
+ raise ValueError(
191
+ "`rope_scaling` must be a dictionary with a minimum of one field, "
192
+ "`type` or `rope_type`."
193
+ )
194
+
195
+ rope_scaling_type = self.rope_scaling.get("type", None) or self.rope_scaling.get(
196
+ "rope_type", None
197
+ )
198
+ if rope_scaling_type is None:
199
+ raise ValueError("`rope_scaling` must have a `type` or `rope_type` field.")
200
+
201
+ valid_rope_types = ["linear", "dynamic", "yarn", "longrope", "llama3"]
202
+ if rope_scaling_type not in valid_rope_types:
203
+ raise ValueError(
204
+ f"`rope_scaling`'s type field must be one of {valid_rope_types}, "
205
+ f"got {rope_scaling_type}"
206
+ )
207
+
208
+
209
+ __all__ = ["IQuestPLTCoderConfig"]
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 2
6
+ ],
7
+ "pad_token_id": 2,
8
+ "temperature": 0.6,
9
+ "top_k": 20,
10
+ "top_p": 0.95,
11
+ "transformers_version": "4.57.1",
12
+ "trust_remote_code": true
13
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e59b0f4976e5cfd528cf131c0b91438ab76650c39ed84a970b5446792f8e7eec
3
+ size 5349504488
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5ae74ba4eabc60dda9b4977508459876c4651a089d5be537781b564a53cde14
3
+ size 5287053272
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59d6feb64f8cdff68b6da8a4cbcea16855a0f671ac7f7b97082c945d2ba5e040
3
+ size 4594961384
model.safetensors.index.json ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 15231499360
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00001-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.plt_gate.bias": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.plt_gate.gate_norm.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.0.self_attn.plt_gate.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
27
+ "model.layers.1.self_attn.plt_gate.bias": "model-00001-of-00003.safetensors",
28
+ "model.layers.1.self_attn.plt_gate.gate_norm.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.1.self_attn.plt_gate.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
31
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00003.safetensors",
33
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
34
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
35
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
36
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
37
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
38
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
39
+ "model.layers.10.self_attn.plt_gate.bias": "model-00003-of-00003.safetensors",
40
+ "model.layers.10.self_attn.plt_gate.gate_norm.weight": "model-00003-of-00003.safetensors",
41
+ "model.layers.10.self_attn.plt_gate.weight": "model-00003-of-00003.safetensors",
42
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
43
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
44
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00003.safetensors",
45
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
46
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
47
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
48
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
49
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
50
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
51
+ "model.layers.11.self_attn.plt_gate.bias": "model-00003-of-00003.safetensors",
52
+ "model.layers.11.self_attn.plt_gate.gate_norm.weight": "model-00003-of-00003.safetensors",
53
+ "model.layers.11.self_attn.plt_gate.weight": "model-00003-of-00003.safetensors",
54
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
55
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
56
+ "model.layers.12.input_layernorm.weight": "model-00003-of-00003.safetensors",
57
+ "model.layers.12.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
58
+ "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
59
+ "model.layers.12.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
60
+ "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
61
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
62
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
63
+ "model.layers.12.self_attn.plt_gate.bias": "model-00003-of-00003.safetensors",
64
+ "model.layers.12.self_attn.plt_gate.gate_norm.weight": "model-00003-of-00003.safetensors",
65
+ "model.layers.12.self_attn.plt_gate.weight": "model-00003-of-00003.safetensors",
66
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
67
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
68
+ "model.layers.13.input_layernorm.weight": "model-00003-of-00003.safetensors",
69
+ "model.layers.13.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
70
+ "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
71
+ "model.layers.13.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
72
+ "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
73
+ "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
74
+ "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
75
+ "model.layers.13.self_attn.plt_gate.bias": "model-00003-of-00003.safetensors",
76
+ "model.layers.13.self_attn.plt_gate.gate_norm.weight": "model-00003-of-00003.safetensors",
77
+ "model.layers.13.self_attn.plt_gate.weight": "model-00003-of-00003.safetensors",
78
+ "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
79
+ "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
80
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
81
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
82
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
83
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
84
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
85
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
86
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
87
+ "model.layers.2.self_attn.plt_gate.bias": "model-00001-of-00003.safetensors",
88
+ "model.layers.2.self_attn.plt_gate.gate_norm.weight": "model-00001-of-00003.safetensors",
89
+ "model.layers.2.self_attn.plt_gate.weight": "model-00001-of-00003.safetensors",
90
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
91
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
92
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
94
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
95
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
96
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
97
+ "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.3.self_attn.plt_gate.bias": "model-00001-of-00003.safetensors",
100
+ "model.layers.3.self_attn.plt_gate.gate_norm.weight": "model-00001-of-00003.safetensors",
101
+ "model.layers.3.self_attn.plt_gate.weight": "model-00001-of-00003.safetensors",
102
+ "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.4.self_attn.plt_gate.bias": "model-00002-of-00003.safetensors",
112
+ "model.layers.4.self_attn.plt_gate.gate_norm.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.4.self_attn.plt_gate.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00003.safetensors",
117
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
118
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
119
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
120
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
121
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
122
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
123
+ "model.layers.5.self_attn.plt_gate.bias": "model-00002-of-00003.safetensors",
124
+ "model.layers.5.self_attn.plt_gate.gate_norm.weight": "model-00002-of-00003.safetensors",
125
+ "model.layers.5.self_attn.plt_gate.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.6.self_attn.plt_gate.bias": "model-00002-of-00003.safetensors",
136
+ "model.layers.6.self_attn.plt_gate.gate_norm.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.6.self_attn.plt_gate.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
144
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
145
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
146
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
147
+ "model.layers.7.self_attn.plt_gate.bias": "model-00002-of-00003.safetensors",
148
+ "model.layers.7.self_attn.plt_gate.gate_norm.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.7.self_attn.plt_gate.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00003.safetensors",
153
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
154
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
155
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
156
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
157
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
158
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
159
+ "model.layers.8.self_attn.plt_gate.bias": "model-00002-of-00003.safetensors",
160
+ "model.layers.8.self_attn.plt_gate.gate_norm.weight": "model-00002-of-00003.safetensors",
161
+ "model.layers.8.self_attn.plt_gate.weight": "model-00002-of-00003.safetensors",
162
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
163
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
164
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
167
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
169
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.9.self_attn.plt_gate.bias": "model-00003-of-00003.safetensors",
172
+ "model.layers.9.self_attn.plt_gate.gate_norm.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.9.self_attn.plt_gate.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.norm.weight": "model-00001-of-00003.safetensors"
177
+ }
178
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|CLS|>",
4
+ "<|SEP|>",
5
+ "<|EOD|>",
6
+ "<|MASK|>",
7
+ "<|PAD|>",
8
+ "<|fim_prefix|>",
9
+ "<|fim_middle|>",
10
+ "<|fim_suffix|>",
11
+ "<|im_start|>",
12
+ "<|im_end|>",
13
+ "<|fim_pad|>",
14
+ "<|endoftext|>",
15
+ "<|repo_name|>",
16
+ "<|file_sep|>"
17
+ ],
18
+ "bos_token": {
19
+ "content": "<s>",
20
+ "lstrip": false,
21
+ "normalized": true,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "eos_token": {
26
+ "content": "<|im_end|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "pad_token": {
33
+ "content": "<|endoftext|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ },
39
+ "unk_token": {
40
+ "content": "<unk>",
41
+ "lstrip": false,
42
+ "normalized": true,
43
+ "rstrip": false,
44
+ "single_word": true
45
+ }
46
+ }
tokenization_iquestcoder.py ADDED
@@ -0,0 +1,570 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tokenization classes for IQuestCoder."""
2
+
3
+ import os
4
+ from shutil import copyfile
5
+ from typing import Any, Dict, List, Optional, Tuple, Union
6
+
7
+ import sentencepiece as spm
8
+
9
+ from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
10
+ from transformers.utils import logging
11
+
12
+
13
+ logger = logging.get_logger(__name__)
14
+
15
+ VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
16
+
17
+ PRETRAINED_VOCAB_FILES_MAP = {
18
+ "vocab_file": {},
19
+ "tokenizer_file": {},
20
+ }
21
+ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}
22
+
23
+
24
+
25
+ class IQuestCoderTokenizer(PreTrainedTokenizer):
26
+
27
+ vocab_files_names = VOCAB_FILES_NAMES
28
+ pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
29
+ max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
30
+ model_input_names = ["input_ids", "attention_mask"]
31
+
32
+ def __init__(
33
+ self,
34
+ vocab_file,
35
+ unk_token="<unk>",
36
+ bos_token="<s>",
37
+ eos_token="</s>",
38
+ pad_token=None,
39
+ sp_model_kwargs: Optional[Dict[str, Any]] = None,
40
+ add_bos_token=True,
41
+ add_eos_token=False,
42
+ clean_up_tokenization_spaces=False,
43
+ add_prefix_space=False,
44
+ legacy=None,
45
+ use_default_system_prompt=False,
46
+ chat_template=None,
47
+ **kwargs,
48
+ ):
49
+ self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
50
+ bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token
51
+ eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
52
+ unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token
53
+ pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
54
+
55
+ # Legacy behavior handling
56
+ if legacy is None:
57
+ logger.warning_once(
58
+ f"You are using the default legacy behaviour of the {self.__class__.__name__}. This is"
59
+ " expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you."
60
+ " If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it"
61
+ " means, and thoroughly read the reason why this was added as explained in"
62
+ " https://github.com/huggingface/transformers/pull/24565"
63
+ )
64
+ legacy = True
65
+
66
+ self.legacy = legacy
67
+ self.vocab_file = vocab_file
68
+ self.add_bos_token = add_bos_token
69
+ self.add_eos_token = add_eos_token
70
+ self.add_prefix_space = add_prefix_space
71
+ self.use_default_system_prompt = use_default_system_prompt
72
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
73
+ self.sp_model.Load(vocab_file)
74
+
75
+
76
+
77
+ super().__init__(
78
+ bos_token=bos_token,
79
+ eos_token=eos_token,
80
+ unk_token=unk_token,
81
+ pad_token=pad_token,
82
+ add_bos_token=add_bos_token,
83
+ add_eos_token=add_eos_token,
84
+ sp_model_kwargs=self.sp_model_kwargs,
85
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
86
+ add_prefix_space=add_prefix_space,
87
+ legacy=legacy,
88
+ use_default_system_prompt=use_default_system_prompt,
89
+ chat_template=chat_template,
90
+ **kwargs,
91
+ )
92
+
93
+ def __getstate__(self):
94
+ state = self.__dict__.copy()
95
+ state["sp_model"] = None
96
+ return state
97
+
98
+ def __setstate__(self, d):
99
+ self.__dict__ = d
100
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
101
+ self.sp_model.Load(self.vocab_file)
102
+
103
+ @property
104
+ def vocab_size(self) -> int:
105
+ """Returns the vocabulary size."""
106
+ return self.sp_model.get_piece_size()
107
+
108
+ def get_vocab(self) -> Dict[str, int]:
109
+ """Returns the vocabulary as a dictionary of token to index."""
110
+ vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
111
+ vocab.update(self.added_tokens_encoder)
112
+ return vocab
113
+
114
+ def _tokenize(self, text: str) -> List[str]:
115
+ """
116
+ Tokenize a string.
117
+
118
+ Args:
119
+ text (`str`): The text to tokenize.
120
+
121
+ Returns:
122
+ `List[str]`: The list of tokens.
123
+ """
124
+ if self.add_prefix_space:
125
+ text = " " + text
126
+
127
+ if self.legacy:
128
+ return self.sp_model.encode(text, out_type=str)
129
+
130
+ # Non-legacy behavior: handle special tokens properly
131
+ return self.sp_model.encode(text, out_type=str)
132
+
133
+ def _convert_token_to_id(self, token: str) -> int:
134
+ """Converts a token (str) to an id using the vocab."""
135
+ return self.sp_model.piece_to_id(token)
136
+
137
+ def _convert_id_to_token(self, index: int) -> str:
138
+ """Converts an index (integer) to a token (str) using the vocab."""
139
+ token = self.sp_model.IdToPiece(index)
140
+ return token
141
+
142
+ def convert_tokens_to_string(self, tokens: List[str]) -> str:
143
+ """
144
+ Converts a sequence of tokens (strings) to a single string.
145
+
146
+ This method handles special tokens separately to ensure they are not
147
+ decoded using the SentencePiece model.
148
+
149
+ Args:
150
+ tokens (`List[str]`): The list of tokens to convert.
151
+
152
+ Returns:
153
+ `str`: The decoded string.
154
+ """
155
+ current_sub_tokens = []
156
+ out_string = ""
157
+ prev_is_special = False
158
+ for i, token in enumerate(tokens):
159
+ # make sure that special tokens are not decoded using sentencepiece model
160
+ if token in self.all_special_tokens:
161
+ if not prev_is_special and i != 0:
162
+ out_string += " "
163
+ out_string += self.sp_model.decode(current_sub_tokens) + token
164
+ prev_is_special = True
165
+ current_sub_tokens = []
166
+ else:
167
+ current_sub_tokens.append(token)
168
+ prev_is_special = False
169
+ out_string += self.sp_model.decode(current_sub_tokens)
170
+ return out_string
171
+
172
+ def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
173
+ """
174
+ Save the vocabulary and special tokens file to a directory.
175
+
176
+ Args:
177
+ save_directory (`str`):
178
+ The directory in which to save the vocabulary.
179
+ filename_prefix (`str`, *optional*):
180
+ An optional prefix to add to the named of the saved files.
181
+
182
+ Returns:
183
+ `Tuple(str)`: Paths to the files saved.
184
+ """
185
+ if not os.path.isdir(save_directory):
186
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
187
+ return
188
+ out_vocab_file = os.path.join(
189
+ save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
190
+ )
191
+
192
+ if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
193
+ copyfile(self.vocab_file, out_vocab_file)
194
+ elif not os.path.isfile(self.vocab_file):
195
+ with open(out_vocab_file, "wb") as fi:
196
+ content_spiece_model = self.sp_model.serialized_model_proto()
197
+ fi.write(content_spiece_model)
198
+
199
+ return (out_vocab_file,)
200
+
201
+ def build_inputs_with_special_tokens(
202
+ self,
203
+ token_ids_0: List[int],
204
+ token_ids_1: Optional[List[int]] = None
205
+ ) -> List[int]:
206
+ """
207
+ Build model inputs from a sequence or a pair of sequences for sequence classification tasks by concatenating
208
+ and adding special tokens.
209
+
210
+ An IQuestCoder sequence has the following format:
211
+
212
+ - single sequence: `<s> X </s>` (if add_eos_token is True) or `<s> X` (default)
213
+ - pair of sequences: `<s> A </s> <s> B </s>` (if add_eos_token is True) or `<s> A <s> B` (default)
214
+
215
+ Args:
216
+ token_ids_0 (`List[int]`):
217
+ List of IDs to which the special tokens will be added.
218
+ token_ids_1 (`List[int]`, *optional*):
219
+ Optional second list of IDs for sequence pairs.
220
+
221
+ Returns:
222
+ `List[int]`: List of input IDs with the appropriate special tokens.
223
+ """
224
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
225
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
226
+
227
+ output = bos_token_id + token_ids_0 + eos_token_id
228
+
229
+ if token_ids_1 is not None:
230
+ output = output + bos_token_id + token_ids_1 + eos_token_id
231
+
232
+ return output
233
+
234
+ def get_special_tokens_mask(
235
+ self,
236
+ token_ids_0: List[int],
237
+ token_ids_1: Optional[List[int]] = None,
238
+ already_has_special_tokens: bool = False
239
+ ) -> List[int]:
240
+ """
241
+ Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
242
+ special tokens using the tokenizer `prepare_for_model` method.
243
+
244
+ Args:
245
+ token_ids_0 (`List[int]`):
246
+ List of IDs.
247
+ token_ids_1 (`List[int]`, *optional*):
248
+ Optional second list of IDs for sequence pairs.
249
+ already_has_special_tokens (`bool`, *optional*, defaults to `False`):
250
+ Whether or not the token list is already formatted with special tokens for the model.
251
+
252
+ Returns:
253
+ `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
254
+ """
255
+ if already_has_special_tokens:
256
+ return super().get_special_tokens_mask(
257
+ token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
258
+ )
259
+
260
+ bos_token_id = [1] if self.add_bos_token else []
261
+ eos_token_id = [1] if self.add_eos_token else []
262
+
263
+ if token_ids_1 is None:
264
+ return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
265
+ return (
266
+ bos_token_id
267
+ + ([0] * len(token_ids_0))
268
+ + eos_token_id
269
+ + bos_token_id
270
+ + ([0] * len(token_ids_1))
271
+ + eos_token_id
272
+ )
273
+
274
+ def create_token_type_ids_from_sequences(
275
+ self,
276
+ token_ids_0: List[int],
277
+ token_ids_1: Optional[List[int]] = None
278
+ ) -> List[int]:
279
+ """
280
+ Create a mask from the two sequences passed to be used in a sequence-pair classification task.
281
+
282
+ An IQuestCoder sequence pair mask has the following format:
283
+
284
+ ```
285
+ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
286
+ | first sequence | second sequence |
287
+ ```
288
+
289
+ If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).
290
+
291
+ Args:
292
+ token_ids_0 (`List[int]`):
293
+ List of IDs.
294
+ token_ids_1 (`List[int]`, *optional*):
295
+ Optional second list of IDs for sequence pairs.
296
+
297
+ Returns:
298
+ `List[int]`: List of token type IDs according to the given sequence(s).
299
+ """
300
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
301
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
302
+
303
+ output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
304
+
305
+ if token_ids_1 is not None:
306
+ output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
307
+
308
+ return output
309
+
310
+ @property
311
+ def default_chat_template(self) -> str:
312
+ """
313
+ Returns the default chat template for IQuestCoder.
314
+
315
+ This template formats conversations with system, user, and assistant roles.
316
+ """
317
+ return DEFAULT_CHAT_TEMPLATE
318
+
319
+ def apply_chat_template(
320
+ self,
321
+ conversation: Union[List[Dict[str, str]], "Conversation"],
322
+ chat_template: Optional[str] = None,
323
+ add_generation_prompt: bool = False,
324
+ tokenize: bool = True,
325
+ padding: bool = False,
326
+ truncation: bool = False,
327
+ max_length: Optional[int] = None,
328
+ return_tensors: Optional[str] = None,
329
+ return_dict: bool = False,
330
+ **tokenizer_kwargs,
331
+ ):
332
+ """
333
+ Apply a chat template to format a conversation.
334
+
335
+ Args:
336
+ conversation (`List[Dict[str, str]]` or `Conversation`):
337
+ A list of dicts with "role" and "content" keys, representing the conversation history.
338
+ chat_template (`str`, *optional*):
339
+ A Jinja template to use for formatting. If not provided, the tokenizer's default will be used.
340
+ add_generation_prompt (`bool`, *optional*, defaults to `False`):
341
+ Whether to add a generation prompt at the end for the assistant to continue.
342
+ tokenize (`bool`, *optional*, defaults to `True`):
343
+ Whether to tokenize the output. If `False`, returns a string.
344
+ padding (`bool`, *optional*, defaults to `False`):
345
+ Whether to pad sequences.
346
+ truncation (`bool`, *optional*, defaults to `False`):
347
+ Whether to truncate sequences.
348
+ max_length (`int`, *optional*):
349
+ Maximum length of the output.
350
+ return_tensors (`str`, *optional*):
351
+ The type of tensors to return ("pt", "tf", "np", or None).
352
+ return_dict (`bool`, *optional*, defaults to `False`):
353
+ Whether to return a dictionary with additional information.
354
+ **tokenizer_kwargs:
355
+ Additional keyword arguments passed to the tokenizer.
356
+
357
+ Returns:
358
+ `Union[str, List[int], BatchEncoding]`: The formatted (and optionally tokenized) conversation.
359
+
360
+ Example:
361
+ ```python
362
+ >>> tokenizer = IQuestCoderTokenizer.from_pretrained("path/to/model")
363
+ >>> conversation = [
364
+ ... {"role": "system", "content": "You are a helpful assistant."},
365
+ ... {"role": "user", "content": "Hello!"},
366
+ ... {"role": "assistant", "content": "Hi there! How can I help you today?"},
367
+ ... {"role": "user", "content": "What's the weather like?"},
368
+ ... ]
369
+ >>> tokenizer.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
370
+ '<|system|>\\nYou are a helpful assistant.\\n</|system|><|user|>\\nHello!\\n</|user|>...'
371
+ ```
372
+ """
373
+ # Use parent class implementation with our template
374
+ return super().apply_chat_template(
375
+ conversation,
376
+ chat_template=chat_template,
377
+ add_generation_prompt=add_generation_prompt,
378
+ tokenize=tokenize,
379
+ padding=padding,
380
+ truncation=truncation,
381
+ max_length=max_length,
382
+ return_tensors=return_tensors,
383
+ return_dict=return_dict,
384
+ **tokenizer_kwargs,
385
+ )
386
+
387
+
388
+ # Try to import and create Fast tokenizer version
389
+ try:
390
+ from transformers import PreTrainedTokenizerFast
391
+ from tokenizers import Tokenizer, decoders, models, normalizers, pre_tokenizers, processors
392
+
393
+ class IQuestCoderTokenizerFast(PreTrainedTokenizerFast):
394
+ """
395
+ Construct a "fast" IQuestCoder tokenizer (backed by HuggingFace's *tokenizers* library).
396
+
397
+ This is a fast implementation of [`IQuestCoderTokenizer`] using the 🤗 Tokenizers library.
398
+
399
+ Args:
400
+ vocab_file (`str`, *optional*):
401
+ Path to the vocabulary file (SentencePiece model).
402
+ tokenizer_file (`str`, *optional*):
403
+ Path to a tokenizer JSON file.
404
+ unk_token (`str`, *optional*, defaults to `"<unk>"`):
405
+ The unknown token.
406
+ bos_token (`str`, *optional*, defaults to `"<s>"`):
407
+ The beginning of sequence token.
408
+ eos_token (`str`, *optional*, defaults to `"</s>"`):
409
+ The end of sequence token.
410
+ pad_token (`str`, *optional*):
411
+ The token used for padding.
412
+ add_bos_token (`bool`, *optional*, defaults to `True`):
413
+ Whether to add a BOS token at the start of sequences.
414
+ add_eos_token (`bool`, *optional*, defaults to `False`):
415
+ Whether to add an EOS token at the end of sequences.
416
+ add_prefix_space (`bool`, *optional*, defaults to `False`):
417
+ Whether to add an initial space to the input.
418
+ use_default_system_prompt (`bool`, *optional*, defaults to `False`):
419
+ Whether to use the default system prompt.
420
+ chat_template (`str`, *optional*):
421
+ A Jinja template for formatting conversations.
422
+
423
+ Example:
424
+ ```python
425
+ >>> from tokenization_iquestcoder import IQuestCoderTokenizerFast
426
+
427
+ >>> tokenizer = IQuestCoderTokenizerFast.from_pretrained("path/to/model")
428
+ >>> tokenizer.encode("Hello, world!")
429
+ [1, 15043, 29892, 3186, 29991]
430
+ ```
431
+ """
432
+
433
+ vocab_files_names = VOCAB_FILES_NAMES
434
+ pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
435
+ max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
436
+ model_input_names = ["input_ids", "attention_mask"]
437
+ slow_tokenizer_class = IQuestCoderTokenizer
438
+
439
+ def __init__(
440
+ self,
441
+ vocab_file=None,
442
+ tokenizer_file=None,
443
+ unk_token="<unk>",
444
+ bos_token="<s>",
445
+ eos_token="</s>",
446
+ pad_token=None,
447
+ add_bos_token=True,
448
+ add_eos_token=False,
449
+ add_prefix_space=False,
450
+ use_default_system_prompt=False,
451
+ chat_template=None,
452
+ **kwargs,
453
+ ):
454
+ self.add_bos_token = add_bos_token
455
+ self.add_eos_token = add_eos_token
456
+ self.add_prefix_space = add_prefix_space
457
+ self.use_default_system_prompt = use_default_system_prompt
458
+ self.vocab_file = vocab_file
459
+
460
+ if chat_template is None:
461
+ chat_template = DEFAULT_CHAT_TEMPLATE
462
+
463
+ super().__init__(
464
+ vocab_file=vocab_file,
465
+ tokenizer_file=tokenizer_file,
466
+ unk_token=unk_token,
467
+ bos_token=bos_token,
468
+ eos_token=eos_token,
469
+ pad_token=pad_token,
470
+ add_bos_token=add_bos_token,
471
+ add_eos_token=add_eos_token,
472
+ add_prefix_space=add_prefix_space,
473
+ use_default_system_prompt=use_default_system_prompt,
474
+ chat_template=chat_template,
475
+ **kwargs,
476
+ )
477
+
478
+ @property
479
+ def can_save_slow_tokenizer(self) -> bool:
480
+ vocab_file = getattr(self, "vocab_file", None)
481
+ return bool(vocab_file) and os.path.isfile(vocab_file)
482
+
483
+ def save_vocabulary(
484
+ self, save_directory: str, filename_prefix: Optional[str] = None
485
+ ) -> Tuple[str]:
486
+ if not self.can_save_slow_tokenizer:
487
+ return ()
488
+ if not os.path.isdir(save_directory):
489
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
490
+ return ()
491
+ out_vocab_file = os.path.join(
492
+ save_directory,
493
+ (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"],
494
+ )
495
+ if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
496
+ copyfile(self.vocab_file, out_vocab_file)
497
+ return (out_vocab_file,)
498
+
499
+ @property
500
+ def default_chat_template(self) -> str:
501
+ """Returns the default chat template."""
502
+ return DEFAULT_CHAT_TEMPLATE
503
+
504
+ def build_inputs_with_special_tokens(
505
+ self,
506
+ token_ids_0: List[int],
507
+ token_ids_1: Optional[List[int]] = None
508
+ ) -> List[int]:
509
+ """Build model inputs with special tokens."""
510
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
511
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
512
+
513
+ output = bos_token_id + token_ids_0 + eos_token_id
514
+
515
+ if token_ids_1 is not None:
516
+ output = output + bos_token_id + token_ids_1 + eos_token_id
517
+
518
+ return output
519
+
520
+ def get_special_tokens_mask(
521
+ self,
522
+ token_ids_0: List[int],
523
+ token_ids_1: Optional[List[int]] = None,
524
+ already_has_special_tokens: bool = False
525
+ ) -> List[int]:
526
+ """Retrieve special tokens mask."""
527
+ if already_has_special_tokens:
528
+ return super().get_special_tokens_mask(
529
+ token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
530
+ )
531
+
532
+ bos_token_id = [1] if self.add_bos_token else []
533
+ eos_token_id = [1] if self.add_eos_token else []
534
+
535
+ if token_ids_1 is None:
536
+ return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
537
+ return (
538
+ bos_token_id
539
+ + ([0] * len(token_ids_0))
540
+ + eos_token_id
541
+ + bos_token_id
542
+ + ([0] * len(token_ids_1))
543
+ + eos_token_id
544
+ )
545
+
546
+ def create_token_type_ids_from_sequences(
547
+ self,
548
+ token_ids_0: List[int],
549
+ token_ids_1: Optional[List[int]] = None
550
+ ) -> List[int]:
551
+ """Create token type IDs from sequences."""
552
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
553
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
554
+
555
+ output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
556
+
557
+ if token_ids_1 is not None:
558
+ output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
559
+
560
+ return output
561
+
562
+ except ImportError:
563
+ # tokenizers library not available, Fast tokenizer not supported
564
+ IQuestCoderTokenizerFast = None
565
+ logger.info(
566
+ "The `tokenizers` library is not installed. "
567
+ "IQuestCoderTokenizerFast will not be available. "
568
+ "Install it with `pip install tokenizers`."
569
+ )
570
+
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d3be68e090a927f31e0e378d7599b15c206dd47e4a73933775a746cc9c1cd91
3
+ size 1345108
tokenizer_config.json ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": false,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": true,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": true,
28
+ "special": true
29
+ },
30
+ "75858": {
31
+ "content": "<CLS>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "75859": {
39
+ "content": "<SEP>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "75860": {
47
+ "content": "<EOD>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "75861": {
55
+ "content": "<MASK>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "75862": {
63
+ "content": "<PAD>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "75863": {
71
+ "content": "<|im_start|>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "75864": {
79
+ "content": "<|im_end|>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "75865": {
87
+ "content": "<|fim_prefix|>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "75866": {
95
+ "content": "<|fim_middle|>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "75867": {
103
+ "content": "<|fim_suffix|>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "75868": {
111
+ "content": "<|fim_pad|>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "75869": {
119
+ "content": "<|endoftext|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "75870": {
127
+ "content": "<|repo_name|>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "75871": {
135
+ "content": "<|file_sep|>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": true
141
+ },
142
+ "75872": {
143
+ "content": "<think>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "75873": {
151
+ "content": "</think>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "75874": {
159
+ "content": "<tools>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "75875": {
167
+ "content": "</tools>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "75876": {
175
+ "content": "<tool_call>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "75877": {
183
+ "content": "</tool_call>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "75878": {
191
+ "content": "<tool_response>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "75879": {
199
+ "content": "</tool_response>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "75880": {
207
+ "content": "<|CLS|>",
208
+ "lstrip": false,
209
+ "normalized": false,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": true
213
+ },
214
+ "75881": {
215
+ "content": "<|SEP|>",
216
+ "lstrip": false,
217
+ "normalized": false,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": true
221
+ },
222
+ "75882": {
223
+ "content": "<|EOD|>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": true
229
+ },
230
+ "75883": {
231
+ "content": "<|MASK|>",
232
+ "lstrip": false,
233
+ "normalized": false,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": true
237
+ },
238
+ "75884": {
239
+ "content": "<|PAD|>",
240
+ "lstrip": false,
241
+ "normalized": false,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": true
245
+ }
246
+ },
247
+ "additional_special_tokens": [
248
+ "<|CLS|>",
249
+ "<|SEP|>",
250
+ "<|EOD|>",
251
+ "<|MASK|>",
252
+ "<|PAD|>",
253
+ "<|fim_prefix|>",
254
+ "<|fim_middle|>",
255
+ "<|fim_suffix|>",
256
+ "<|im_start|>",
257
+ "<|im_end|>",
258
+ "<|fim_pad|>",
259
+ "<|endoftext|>",
260
+ "<|repo_name|>",
261
+ "<|file_sep|>"
262
+ ],
263
+ "auto_map": {
264
+ "AutoTokenizer": [
265
+ "tokenization_iquestcoder.IQuestCoderTokenizer",
266
+ "tokenization_iquestcoder.IQuestCoderTokenizerFast"
267
+ ]
268
+ },
269
+ "bos_token": "<s>",
270
+ "clean_up_tokenization_spaces": false,
271
+ "eos_token": "<|im_end|>",
272
+ "extra_special_tokens": {},
273
+ "model_max_length": 131072,
274
+ "pad_token": "<|endoftext|>",
275
+ "padding_side": "right",
276
+ "sp_model_kwargs": {},
277
+ "split_special_tokens": false,
278
+ "tokenizer_class": "IQuestCoderTokenizer",
279
+ "unk_token": "<unk>",
280
+ "use_default_system_prompt": false,
281
+ "use_fast": true
282
+ }