black-yt commited on
Commit
d4ad2d7
·
1 Parent(s): 1411198

Add LabHorizon Qwen LoRA adapter

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ base_model: Qwen/Qwen3.6-35B-A3B
4
+ library_name: peft
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - laboratory
8
+ - protocol-conditioned-action-prediction
9
+ - lora
10
+ - qwen
11
+ - long-horizon-planning
12
  ---
13
+
14
+ <div align="center">
15
+ <div style="font-size: 2em; font-weight: bold;">
16
+ LabHorizon Model
17
+ </div>
18
+ </div>
19
+
20
+ <div align="center">
21
+
22
+ [![Website](https://img.shields.io/badge/%F0%9F%9A%80%20Website-LabHorizon-00c2a8)](https://conglab-research.github.io/LabHorizon/)&nbsp;
23
+ ![arXiv](https://img.shields.io/badge/arXiv-coming%20soon-b31b1b?logo=arxiv&logoColor=white)&nbsp;
24
+ [![Code](https://img.shields.io/badge/Code-LabHorizon-000000?logo=github&logoColor=white)](https://github.com/CongLab-Research/LabHorizon)&nbsp;
25
+ [![Data L1](https://img.shields.io/badge/%F0%9F%A4%97%20Data-L1-blue)](https://huggingface.co/datasets/CongLab-Research/LabHorizon-3D-Asset-Perception)&nbsp;
26
+ [![Data L2](https://img.shields.io/badge/%F0%9F%A4%97%20Data-L2-purple)](https://huggingface.co/datasets/CongLab-Research/LabHorizon-Protocol-Conditioned-Planning)&nbsp;
27
+ [![Model](https://img.shields.io/badge/%F0%9F%A4%97%20Model-LoRA-orange)](https://huggingface.co/CongLab-Research/LabHorizon-Model)
28
+
29
+ **Qwen3.6-35B-A3B LoRA for protocol-conditioned laboratory action prediction**
30
+
31
+ </div>
32
+
33
+ ---
34
+
35
+ ## 🔎 Overview
36
+
37
+ This repository releases the LabHorizon LoRA adapter trained from `Qwen/Qwen3.6-35B-A3B` on the 6,000-sample LabHorizon training split. The model is optimized for **Protocol-Conditioned Action Prediction**:
38
+
39
+ - **Level 1:** connect multi-view laboratory assets and historical actions to the gold next action.
40
+ - **Level 2:** produce a structured long-horizon experimental action sequence from context, constraints, available inputs, and an action pool.
41
+
42
+ The released weights are an adapter, not the base model. Load them with the corresponding Qwen3.6-35B-A3B base model.
43
+
44
+ ## 📦 Files
45
+
46
+ | File | Meaning |
47
+ |:---|:---|
48
+ | `adapter_model.safetensors` | LoRA adapter weights. |
49
+ | `adapter_config.json` | PEFT adapter configuration. |
50
+ | `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja` | Tokenizer and chat template files used for training/evaluation. |
51
+ | `processor_config.json` | Processor configuration. |
52
+ | `train_results.json`, `eval_results.json`, `all_results.json` | Training and evaluation summaries from the LoRA run. |
53
+ | `trainer_state.json`, `trainer_log.jsonl`, `training_args.bin` | Training state and arguments for reproducibility. |
54
+ | `training_loss.png`, `training_eval_loss.png` | Loss curves. |
55
+
56
+ ## 🧠 Training Result
57
+
58
+ The table compares direct-prompting SOTA/baseline systems, the base Qwen model, this trained LoRA adapter, and the trained+agents system evaluated on the same LabHorizon test splits.
59
+
60
+ | System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
61
+ |:---|---:|---:|---:|---:|
62
+ | Grok 4.3 | 0.555 | 0.3339 | 0.3148 | 0.3244 |
63
+ | Gemini 3.1 Pro Preview | 0.465 | 0.3195 | 0.3331 | 0.3263 |
64
+ | GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
65
+ | Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
66
+ | Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
67
+ | Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
68
+ | Qwen3.6-35B-A3B(trained+agents*) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
69
+
70
+ `*` uses `Qwen3.6-35B-A3B(trained)` as Actor and Gemini 3.1 Pro Preview as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.
71
+
72
+ ## ⚙️ Loading
73
+
74
+ ```python
75
+ from transformers import AutoModelForCausalLM, AutoProcessor
76
+ from peft import PeftModel
77
+
78
+ base_id = "Qwen/Qwen3.6-35B-A3B"
79
+ adapter_id = "CongLab-Research/LabHorizon-Model"
80
+
81
+ processor = AutoProcessor.from_pretrained(adapter_id, trust_remote_code=True)
82
+ base = AutoModelForCausalLM.from_pretrained(
83
+ base_id,
84
+ device_map="auto",
85
+ torch_dtype="auto",
86
+ trust_remote_code=True,
87
+ )
88
+ model = PeftModel.from_pretrained(base, adapter_id)
89
+ ```
90
+
91
+ ## ⚠️ Intended Use
92
+
93
+ This adapter is intended for academic research on laboratory action prediction, experimental planning, and AI scientist systems. It should not be used as an autonomous wet-lab controller or for safety-critical experimental decisions without expert review.
94
+
95
+ ## 📜 Citation
96
+
97
+ Coming soon...
adapter_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3.6-35B-A3B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "down_proj",
33
+ "shared_expert_gate",
34
+ "v_proj",
35
+ "k_proj",
36
+ "in_proj_b",
37
+ "linear_fc1",
38
+ "up_proj",
39
+ "in_proj_qkv",
40
+ "attn.proj",
41
+ "in_proj_z",
42
+ "gate_proj",
43
+ "out_proj",
44
+ "q_proj",
45
+ "qkv",
46
+ "linear_fc2",
47
+ "o_proj",
48
+ "in_proj_a"
49
+ ],
50
+ "target_parameters": null,
51
+ "task_type": "CAUSAL_LM",
52
+ "trainable_token_indices": null,
53
+ "use_dora": false,
54
+ "use_qalora": false,
55
+ "use_rslora": false
56
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a706ab043f27c19ad89d5ae4acf81c304b603b3ac9926105f34d12dc3173cbf
3
+ size 243590944
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_loss": 0.44259119033813477,
4
+ "eval_runtime": 27.1503,
5
+ "eval_samples_per_second": 14.733,
6
+ "eval_steps_per_second": 2.468,
7
+ "total_flos": 3.634151342457697e+19,
8
+ "train_loss": 0.2690703985452652,
9
+ "train_runtime": 10014.7733,
10
+ "train_samples_per_second": 5.991,
11
+ "train_steps_per_second": 0.25
12
+ }
chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | string if args_value is string else args_value | tojson | safe %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is false %}
150
+ {{- '<think>\n\n</think>\n\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
eval_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_loss": 0.44259119033813477,
4
+ "eval_runtime": 27.1503,
5
+ "eval_samples_per_second": 14.733,
6
+ "eval_steps_per_second": 2.468
7
+ }
processor_config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_processor": {
3
+ "do_convert_rgb": true,
4
+ "do_normalize": true,
5
+ "do_rescale": true,
6
+ "do_resize": true,
7
+ "image_mean": [
8
+ 0.5,
9
+ 0.5,
10
+ 0.5
11
+ ],
12
+ "image_processor_type": "Qwen2VLImageProcessor",
13
+ "image_std": [
14
+ 0.5,
15
+ 0.5,
16
+ 0.5
17
+ ],
18
+ "merge_size": 2,
19
+ "patch_size": 16,
20
+ "resample": 3,
21
+ "rescale_factor": 0.00392156862745098,
22
+ "size": {
23
+ "longest_edge": 16777216,
24
+ "shortest_edge": 65536
25
+ },
26
+ "temporal_patch_size": 2
27
+ },
28
+ "processor_class": "Qwen3VLProcessor",
29
+ "video_processor": {
30
+ "do_convert_rgb": true,
31
+ "do_normalize": true,
32
+ "do_rescale": true,
33
+ "do_resize": true,
34
+ "do_sample_frames": true,
35
+ "fps": 2,
36
+ "image_mean": [
37
+ 0.5,
38
+ 0.5,
39
+ 0.5
40
+ ],
41
+ "image_std": [
42
+ 0.5,
43
+ 0.5,
44
+ 0.5
45
+ ],
46
+ "max_frames": 768,
47
+ "merge_size": 2,
48
+ "min_frames": 4,
49
+ "patch_size": 16,
50
+ "resample": 3,
51
+ "rescale_factor": 0.00392156862745098,
52
+ "return_metadata": false,
53
+ "size": {
54
+ "longest_edge": 25165824,
55
+ "shortest_edge": 4096
56
+ },
57
+ "temporal_patch_size": 2,
58
+ "video_processor_type": "Qwen3VLVideoProcessor"
59
+ }
60
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06b9509352d2af50381ab2247e083b80d32d5c0aba91c272ca9ff729b6a0e523
3
+ size 19989325
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": false,
13
+ "local_files_only": false,
14
+ "model_max_length": 262144,
15
+ "model_specific_special_tokens": {
16
+ "audio_bos_token": "<|audio_start|>",
17
+ "audio_eos_token": "<|audio_end|>",
18
+ "audio_token": "<|audio_pad|>",
19
+ "image_token": "<|image_pad|>",
20
+ "video_token": "<|video_pad|>",
21
+ "vision_bos_token": "<|vision_start|>",
22
+ "vision_eos_token": "<|vision_end|>"
23
+ },
24
+ "pad_token": "<|endoftext|>",
25
+ "padding_side": "right",
26
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
27
+ "processor_class": "Qwen3VLProcessor",
28
+ "split_special_tokens": false,
29
+ "tokenizer_class": "Qwen2Tokenizer",
30
+ "unk_token": null,
31
+ "video_token": "<|video_pad|>",
32
+ "vision_bos_token": "<|vision_start|>",
33
+ "vision_eos_token": "<|vision_end|>"
34
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 3.634151342457697e+19,
4
+ "train_loss": 0.2690703985452652,
5
+ "train_runtime": 10014.7733,
6
+ "train_samples_per_second": 5.991,
7
+ "train_steps_per_second": 0.25
8
+ }
trainer_log.jsonl ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"current_steps": 10, "total_steps": 2500, "loss": 1.1145790100097657, "lr": 3.6e-06, "epoch": 0.04, "percentage": 0.4, "elapsed_time": "0:01:53", "remaining_time": "7:49:28"}
2
+ {"current_steps": 20, "total_steps": 2500, "loss": 1.2167404174804688, "lr": 7.6e-06, "epoch": 0.08, "percentage": 0.8, "elapsed_time": "0:02:32", "remaining_time": "5:15:11"}
3
+ {"current_steps": 30, "total_steps": 2500, "loss": 1.0437713623046876, "lr": 1.16e-05, "epoch": 0.12, "percentage": 1.2, "elapsed_time": "0:03:15", "remaining_time": "4:27:50"}
4
+ {"current_steps": 40, "total_steps": 2500, "loss": 0.9282869338989258, "lr": 1.56e-05, "epoch": 0.16, "percentage": 1.6, "elapsed_time": "0:03:58", "remaining_time": "4:04:27"}
5
+ {"current_steps": 50, "total_steps": 2500, "loss": 0.8799624443054199, "lr": 1.9600000000000002e-05, "epoch": 0.2, "percentage": 2.0, "elapsed_time": "0:04:40", "remaining_time": "3:48:42"}
6
+ {"current_steps": 60, "total_steps": 2500, "loss": 0.7062759399414062, "lr": 2.36e-05, "epoch": 0.24, "percentage": 2.4, "elapsed_time": "0:05:21", "remaining_time": "3:37:36"}
7
+ {"current_steps": 70, "total_steps": 2500, "loss": 0.7228042602539062, "lr": 2.7600000000000003e-05, "epoch": 0.28, "percentage": 2.8, "elapsed_time": "0:06:01", "remaining_time": "3:29:20"}
8
+ {"current_steps": 80, "total_steps": 2500, "loss": 0.6257906913757324, "lr": 3.16e-05, "epoch": 0.32, "percentage": 3.2, "elapsed_time": "0:06:42", "remaining_time": "3:23:02"}
9
+ {"current_steps": 90, "total_steps": 2500, "loss": 0.5399329185485839, "lr": 3.56e-05, "epoch": 0.36, "percentage": 3.6, "elapsed_time": "0:07:23", "remaining_time": "3:17:44"}
10
+ {"current_steps": 100, "total_steps": 2500, "loss": 0.5184461116790772, "lr": 3.960000000000001e-05, "epoch": 0.4, "percentage": 4.0, "elapsed_time": "0:08:03", "remaining_time": "3:13:14"}
11
+ {"current_steps": 100, "total_steps": 2500, "eval_loss": 0.5476460456848145, "epoch": 0.4, "percentage": 4.0, "elapsed_time": "0:08:24", "remaining_time": "3:21:50"}
12
+ {"current_steps": 110, "total_steps": 2500, "loss": 0.5210700988769531, "lr": 4.36e-05, "epoch": 0.44, "percentage": 4.4, "elapsed_time": "0:09:05", "remaining_time": "3:17:37"}
13
+ {"current_steps": 120, "total_steps": 2500, "loss": 0.5155693531036377, "lr": 4.76e-05, "epoch": 0.48, "percentage": 4.8, "elapsed_time": "0:09:46", "remaining_time": "3:13:46"}
14
+ {"current_steps": 130, "total_steps": 2500, "loss": 0.45534143447875974, "lr": 5.16e-05, "epoch": 0.52, "percentage": 5.2, "elapsed_time": "0:10:26", "remaining_time": "3:10:12"}
15
+ {"current_steps": 140, "total_steps": 2500, "loss": 0.45524797439575193, "lr": 5.560000000000001e-05, "epoch": 0.56, "percentage": 5.6, "elapsed_time": "0:11:06", "remaining_time": "3:07:08"}
16
+ {"current_steps": 150, "total_steps": 2500, "loss": 0.47152209281921387, "lr": 5.96e-05, "epoch": 0.6, "percentage": 6.0, "elapsed_time": "0:11:45", "remaining_time": "3:04:10"}
17
+ {"current_steps": 160, "total_steps": 2500, "loss": 0.4532940864562988, "lr": 6.36e-05, "epoch": 0.64, "percentage": 6.4, "elapsed_time": "0:12:23", "remaining_time": "3:01:14"}
18
+ {"current_steps": 170, "total_steps": 2500, "loss": 0.48988704681396483, "lr": 6.76e-05, "epoch": 0.68, "percentage": 6.8, "elapsed_time": "0:12:59", "remaining_time": "2:58:08"}
19
+ {"current_steps": 180, "total_steps": 2500, "loss": 0.46865572929382326, "lr": 7.16e-05, "epoch": 0.72, "percentage": 7.2, "elapsed_time": "0:13:37", "remaining_time": "2:55:41"}
20
+ {"current_steps": 190, "total_steps": 2500, "loss": 0.45577139854431153, "lr": 7.560000000000001e-05, "epoch": 0.76, "percentage": 7.6, "elapsed_time": "0:14:15", "remaining_time": "2:53:22"}
21
+ {"current_steps": 200, "total_steps": 2500, "loss": 0.4559042453765869, "lr": 7.960000000000001e-05, "epoch": 0.8, "percentage": 8.0, "elapsed_time": "0:14:52", "remaining_time": "2:50:58"}
22
+ {"current_steps": 200, "total_steps": 2500, "eval_loss": 0.485470175743103, "epoch": 0.8, "percentage": 8.0, "elapsed_time": "0:15:09", "remaining_time": "2:54:18"}
23
+ {"current_steps": 210, "total_steps": 2500, "loss": 0.45926451683044434, "lr": 8.36e-05, "epoch": 0.84, "percentage": 8.4, "elapsed_time": "0:15:49", "remaining_time": "2:52:34"}
24
+ {"current_steps": 220, "total_steps": 2500, "loss": 0.4545453548431396, "lr": 8.76e-05, "epoch": 0.88, "percentage": 8.8, "elapsed_time": "0:16:33", "remaining_time": "2:51:32"}
25
+ {"current_steps": 230, "total_steps": 2500, "loss": 0.47637343406677246, "lr": 9.16e-05, "epoch": 0.92, "percentage": 9.2, "elapsed_time": "0:17:13", "remaining_time": "2:49:56"}
26
+ {"current_steps": 240, "total_steps": 2500, "loss": 0.43120541572570803, "lr": 9.56e-05, "epoch": 0.96, "percentage": 9.6, "elapsed_time": "0:17:50", "remaining_time": "2:48:02"}
27
+ {"current_steps": 250, "total_steps": 2500, "loss": 0.4153712272644043, "lr": 9.960000000000001e-05, "epoch": 1.0, "percentage": 10.0, "elapsed_time": "0:18:29", "remaining_time": "2:46:26"}
28
+ {"current_steps": 260, "total_steps": 2500, "loss": 0.44300012588500975, "lr": 9.999605221019081e-05, "epoch": 1.04, "percentage": 10.4, "elapsed_time": "0:19:08", "remaining_time": "2:44:53"}
29
+ {"current_steps": 270, "total_steps": 2500, "loss": 0.462084436416626, "lr": 9.998240632972073e-05, "epoch": 1.08, "percentage": 10.8, "elapsed_time": "0:19:48", "remaining_time": "2:43:34"}
30
+ {"current_steps": 280, "total_steps": 2500, "loss": 0.39808471202850343, "lr": 9.995901628010196e-05, "epoch": 1.12, "percentage": 11.2, "elapsed_time": "0:20:26", "remaining_time": "2:42:01"}
31
+ {"current_steps": 290, "total_steps": 2500, "loss": 0.423044490814209, "lr": 9.9925886621271e-05, "epoch": 1.16, "percentage": 11.6, "elapsed_time": "0:21:04", "remaining_time": "2:40:38"}
32
+ {"current_steps": 300, "total_steps": 2500, "loss": 0.41622562408447267, "lr": 9.98830238119205e-05, "epoch": 1.2, "percentage": 12.0, "elapsed_time": "0:21:43", "remaining_time": "2:39:22"}
33
+ {"current_steps": 300, "total_steps": 2500, "eval_loss": 0.4695434272289276, "epoch": 1.2, "percentage": 12.0, "elapsed_time": "0:22:03", "remaining_time": "2:41:43"}
34
+ {"current_steps": 310, "total_steps": 2500, "loss": 0.4166346549987793, "lr": 9.983043620824005e-05, "epoch": 1.24, "percentage": 12.4, "elapsed_time": "0:22:45", "remaining_time": "2:40:48"}
35
+ {"current_steps": 320, "total_steps": 2500, "loss": 0.43734130859375, "lr": 9.97681340622872e-05, "epoch": 1.28, "percentage": 12.8, "elapsed_time": "0:23:23", "remaining_time": "2:39:19"}
36
+ {"current_steps": 330, "total_steps": 2500, "loss": 0.3747305631637573, "lr": 9.969612951998874e-05, "epoch": 1.32, "percentage": 13.2, "elapsed_time": "0:23:57", "remaining_time": "2:37:32"}
37
+ {"current_steps": 340, "total_steps": 2500, "loss": 0.42578792572021484, "lr": 9.961443661877289e-05, "epoch": 1.3599999999999999, "percentage": 13.6, "elapsed_time": "0:24:31", "remaining_time": "2:35:46"}
38
+ {"current_steps": 350, "total_steps": 2500, "loss": 0.39537777900695803, "lr": 9.952307128483256e-05, "epoch": 1.4, "percentage": 14.0, "elapsed_time": "0:25:06", "remaining_time": "2:34:17"}
39
+ {"current_steps": 360, "total_steps": 2500, "loss": 0.4084367275238037, "lr": 9.942205133002068e-05, "epoch": 1.44, "percentage": 14.4, "elapsed_time": "0:25:44", "remaining_time": "2:33:03"}
40
+ {"current_steps": 370, "total_steps": 2500, "loss": 0.3781426906585693, "lr": 9.931139644837754e-05, "epoch": 1.48, "percentage": 14.8, "elapsed_time": "0:26:21", "remaining_time": "2:31:45"}
41
+ {"current_steps": 380, "total_steps": 2500, "loss": 0.3952002048492432, "lr": 9.919112821229163e-05, "epoch": 1.52, "percentage": 15.2, "elapsed_time": "0:27:00", "remaining_time": "2:30:37"}
42
+ {"current_steps": 390, "total_steps": 2500, "loss": 0.4087832927703857, "lr": 9.906127006829384e-05, "epoch": 1.56, "percentage": 15.6, "elapsed_time": "0:27:37", "remaining_time": "2:29:29"}
43
+ {"current_steps": 400, "total_steps": 2500, "loss": 0.3861570119857788, "lr": 9.892184733248666e-05, "epoch": 1.6, "percentage": 16.0, "elapsed_time": "0:28:16", "remaining_time": "2:28:27"}
44
+ {"current_steps": 400, "total_steps": 2500, "eval_loss": 0.45406103134155273, "epoch": 1.6, "percentage": 16.0, "elapsed_time": "0:28:36", "remaining_time": "2:30:10"}
45
+ {"current_steps": 410, "total_steps": 2500, "loss": 0.39033331871032717, "lr": 9.877288718560866e-05, "epoch": 1.6400000000000001, "percentage": 16.4, "elapsed_time": "0:29:17", "remaining_time": "2:29:19"}
46
+ {"current_steps": 420, "total_steps": 2500, "loss": 0.43663845062255857, "lr": 9.861441866773564e-05, "epoch": 1.6800000000000002, "percentage": 16.8, "elapsed_time": "0:29:55", "remaining_time": "2:28:12"}
47
+ {"current_steps": 430, "total_steps": 2500, "loss": 0.43364706039428713, "lr": 9.844647267261916e-05, "epoch": 1.72, "percentage": 17.2, "elapsed_time": "0:30:30", "remaining_time": "2:26:53"}
48
+ {"current_steps": 440, "total_steps": 2500, "loss": 0.409498929977417, "lr": 9.82690819416637e-05, "epoch": 1.76, "percentage": 17.6, "elapsed_time": "0:31:04", "remaining_time": "2:25:28"}
49
+ {"current_steps": 450, "total_steps": 2500, "loss": 0.4264820098876953, "lr": 9.808228105754376e-05, "epoch": 1.8, "percentage": 18.0, "elapsed_time": "0:31:38", "remaining_time": "2:24:06"}
50
+ {"current_steps": 460, "total_steps": 2500, "loss": 0.417040491104126, "lr": 9.788610643746184e-05, "epoch": 1.8399999999999999, "percentage": 18.4, "elapsed_time": "0:32:11", "remaining_time": "2:22:47"}
51
+ {"current_steps": 470, "total_steps": 2500, "loss": 0.3749807357788086, "lr": 9.76805963260488e-05, "epoch": 1.88, "percentage": 18.8, "elapsed_time": "0:32:48", "remaining_time": "2:21:43"}
52
+ {"current_steps": 480, "total_steps": 2500, "loss": 0.4022481918334961, "lr": 9.746579078790807e-05, "epoch": 1.92, "percentage": 19.2, "elapsed_time": "0:33:25", "remaining_time": "2:20:40"}
53
+ {"current_steps": 490, "total_steps": 2500, "loss": 0.38319835662841795, "lr": 9.724173169980491e-05, "epoch": 1.96, "percentage": 19.6, "elapsed_time": "0:34:02", "remaining_time": "2:19:40"}
54
+ {"current_steps": 500, "total_steps": 2500, "loss": 0.4122174263000488, "lr": 9.700846274250251e-05, "epoch": 2.0, "percentage": 20.0, "elapsed_time": "0:34:41", "remaining_time": "2:18:46"}
55
+ {"current_steps": 500, "total_steps": 2500, "eval_loss": 0.44415727257728577, "epoch": 2.0, "percentage": 20.0, "elapsed_time": "0:35:00", "remaining_time": "2:20:01"}
56
+ {"current_steps": 510, "total_steps": 2500, "loss": 0.3524669408798218, "lr": 9.676602939224629e-05, "epoch": 2.04, "percentage": 20.4, "elapsed_time": "0:35:41", "remaining_time": "2:19:16"}
57
+ {"current_steps": 520, "total_steps": 2500, "loss": 0.3717231273651123, "lr": 9.651447891189825e-05, "epoch": 2.08, "percentage": 20.8, "elapsed_time": "0:36:17", "remaining_time": "2:18:10"}
58
+ {"current_steps": 530, "total_steps": 2500, "loss": 0.40065832138061525, "lr": 9.62538603417229e-05, "epoch": 2.12, "percentage": 21.2, "elapsed_time": "0:36:49", "remaining_time": "2:16:53"}
59
+ {"current_steps": 540, "total_steps": 2500, "loss": 0.33635973930358887, "lr": 9.598422448982696e-05, "epoch": 2.16, "percentage": 21.6, "elapsed_time": "0:37:25", "remaining_time": "2:15:49"}
60
+ {"current_steps": 550, "total_steps": 2500, "loss": 0.3708656787872314, "lr": 9.570562392225396e-05, "epoch": 2.2, "percentage": 22.0, "elapsed_time": "0:38:03", "remaining_time": "2:14:55"}
61
+ {"current_steps": 560, "total_steps": 2500, "loss": 0.35284056663513186, "lr": 9.541811295273656e-05, "epoch": 2.24, "percentage": 22.4, "elapsed_time": "0:38:38", "remaining_time": "2:13:51"}
62
+ {"current_steps": 570, "total_steps": 2500, "loss": 0.3429510831832886, "lr": 9.512174763210797e-05, "epoch": 2.2800000000000002, "percentage": 22.8, "elapsed_time": "0:39:15", "remaining_time": "2:12:55"}
63
+ {"current_steps": 580, "total_steps": 2500, "loss": 0.36770102977752683, "lr": 9.481658573737465e-05, "epoch": 2.32, "percentage": 23.2, "elapsed_time": "0:39:52", "remaining_time": "2:12:01"}
64
+ {"current_steps": 590, "total_steps": 2500, "loss": 0.3684037208557129, "lr": 9.450268676045262e-05, "epoch": 2.36, "percentage": 23.6, "elapsed_time": "0:40:31", "remaining_time": "2:11:12"}
65
+ {"current_steps": 600, "total_steps": 2500, "loss": 0.3221792697906494, "lr": 9.418011189656941e-05, "epoch": 2.4, "percentage": 24.0, "elapsed_time": "0:41:11", "remaining_time": "2:10:25"}
66
+ {"current_steps": 600, "total_steps": 2500, "eval_loss": 0.44748273491859436, "epoch": 2.4, "percentage": 24.0, "elapsed_time": "0:41:30", "remaining_time": "2:11:25"}
67
+ {"current_steps": 610, "total_steps": 2500, "loss": 0.40174164772033694, "lr": 9.384892403233384e-05, "epoch": 2.44, "percentage": 24.4, "elapsed_time": "0:42:07", "remaining_time": "2:10:30"}
68
+ {"current_steps": 620, "total_steps": 2500, "loss": 0.3701002836227417, "lr": 9.35091877334763e-05, "epoch": 2.48, "percentage": 24.8, "elapsed_time": "0:42:41", "remaining_time": "2:09:28"}
69
+ {"current_steps": 630, "total_steps": 2500, "loss": 0.3759175777435303, "lr": 9.316096923226135e-05, "epoch": 2.52, "percentage": 25.2, "elapsed_time": "0:43:18", "remaining_time": "2:08:33"}
70
+ {"current_steps": 640, "total_steps": 2500, "loss": 0.3581662178039551, "lr": 9.28043364145758e-05, "epoch": 2.56, "percentage": 25.6, "elapsed_time": "0:43:53", "remaining_time": "2:07:33"}
71
+ {"current_steps": 650, "total_steps": 2500, "loss": 0.35065665245056155, "lr": 9.24393588066941e-05, "epoch": 2.6, "percentage": 26.0, "elapsed_time": "0:44:26", "remaining_time": "2:06:28"}
72
+ {"current_steps": 660, "total_steps": 2500, "loss": 0.36879355907440187, "lr": 9.206610756172402e-05, "epoch": 2.64, "percentage": 26.4, "elapsed_time": "0:45:03", "remaining_time": "2:05:37"}
73
+ {"current_steps": 670, "total_steps": 2500, "loss": 0.3592060565948486, "lr": 9.168465544573536e-05, "epoch": 2.68, "percentage": 26.8, "elapsed_time": "0:45:43", "remaining_time": "2:04:53"}
74
+ {"current_steps": 680, "total_steps": 2500, "loss": 0.36156315803527833, "lr": 9.129507682357394e-05, "epoch": 2.7199999999999998, "percentage": 27.2, "elapsed_time": "0:46:22", "remaining_time": "2:04:08"}
75
+ {"current_steps": 690, "total_steps": 2500, "loss": 0.34445748329162595, "lr": 9.089744764436403e-05, "epoch": 2.76, "percentage": 27.6, "elapsed_time": "0:47:02", "remaining_time": "2:03:23"}
76
+ {"current_steps": 700, "total_steps": 2500, "loss": 0.3526463985443115, "lr": 9.049184542670199e-05, "epoch": 2.8, "percentage": 28.0, "elapsed_time": "0:47:37", "remaining_time": "2:02:27"}
77
+ {"current_steps": 700, "total_steps": 2500, "eval_loss": 0.44259119033813477, "epoch": 2.8, "percentage": 28.0, "elapsed_time": "0:47:54", "remaining_time": "2:03:10"}
78
+ {"current_steps": 710, "total_steps": 2500, "loss": 0.3458081245422363, "lr": 9.007834924354383e-05, "epoch": 2.84, "percentage": 28.4, "elapsed_time": "0:48:32", "remaining_time": "2:02:23"}
79
+ {"current_steps": 720, "total_steps": 2500, "loss": 0.3651163101196289, "lr": 8.965703970678974e-05, "epoch": 2.88, "percentage": 28.8, "elapsed_time": "0:49:08", "remaining_time": "2:01:28"}
80
+ {"current_steps": 730, "total_steps": 2500, "loss": 0.3218229293823242, "lr": 8.922799895156867e-05, "epoch": 2.92, "percentage": 29.2, "elapsed_time": "0:49:42", "remaining_time": "2:00:31"}
81
+ {"current_steps": 740, "total_steps": 2500, "loss": 0.3561582088470459, "lr": 8.879131062022598e-05, "epoch": 2.96, "percentage": 29.6, "elapsed_time": "0:50:18", "remaining_time": "1:59:39"}
82
+ {"current_steps": 750, "total_steps": 2500, "loss": 0.36128854751586914, "lr": 8.834705984601708e-05, "epoch": 3.0, "percentage": 30.0, "elapsed_time": "0:50:55", "remaining_time": "1:58:48"}
83
+ {"current_steps": 760, "total_steps": 2500, "loss": 0.31422438621521, "lr": 8.789533323651066e-05, "epoch": 3.04, "percentage": 30.4, "elapsed_time": "0:51:32", "remaining_time": "1:57:59"}
84
+ {"current_steps": 770, "total_steps": 2500, "loss": 0.29355826377868655, "lr": 8.74362188567043e-05, "epoch": 3.08, "percentage": 30.8, "elapsed_time": "0:52:11", "remaining_time": "1:57:15"}
85
+ {"current_steps": 780, "total_steps": 2500, "loss": 0.3185117721557617, "lr": 8.696980621185602e-05, "epoch": 3.12, "percentage": 31.2, "elapsed_time": "0:52:51", "remaining_time": "1:56:32"}
86
+ {"current_steps": 790, "total_steps": 2500, "loss": 0.28971233367919924, "lr": 8.649618623003508e-05, "epoch": 3.16, "percentage": 31.6, "elapsed_time": "0:53:28", "remaining_time": "1:55:45"}
87
+ {"current_steps": 800, "total_steps": 2500, "loss": 0.3055370092391968, "lr": 8.601545124439535e-05, "epoch": 3.2, "percentage": 32.0, "elapsed_time": "0:54:06", "remaining_time": "1:54:58"}
88
+ {"current_steps": 800, "total_steps": 2500, "eval_loss": 0.4529191255569458, "epoch": 3.2, "percentage": 32.0, "elapsed_time": "0:54:24", "remaining_time": "1:55:37"}
89
+ {"current_steps": 810, "total_steps": 2500, "loss": 0.28035550117492675, "lr": 8.552769497517482e-05, "epoch": 3.24, "percentage": 32.4, "elapsed_time": "0:55:03", "remaining_time": "1:54:52"}
90
+ {"current_steps": 820, "total_steps": 2500, "loss": 0.3199602603912354, "lr": 8.503301251142459e-05, "epoch": 3.2800000000000002, "percentage": 32.8, "elapsed_time": "0:55:37", "remaining_time": "1:53:57"}
91
+ {"current_steps": 830, "total_steps": 2500, "loss": 0.29444499015808107, "lr": 8.453150029247114e-05, "epoch": 3.32, "percentage": 33.2, "elapsed_time": "0:56:12", "remaining_time": "1:53:04"}
92
+ {"current_steps": 840, "total_steps": 2500, "loss": 0.30467259883880615, "lr": 8.402325608911526e-05, "epoch": 3.36, "percentage": 33.6, "elapsed_time": "0:56:45", "remaining_time": "1:52:10"}
93
+ {"current_steps": 850, "total_steps": 2500, "loss": 0.3117033004760742, "lr": 8.350837898457143e-05, "epoch": 3.4, "percentage": 34.0, "elapsed_time": "0:57:19", "remaining_time": "1:51:15"}
94
+ {"current_steps": 860, "total_steps": 2500, "loss": 0.34261503219604494, "lr": 8.298696935515132e-05, "epoch": 3.44, "percentage": 34.4, "elapsed_time": "0:57:55", "remaining_time": "1:50:27"}
95
+ {"current_steps": 870, "total_steps": 2500, "loss": 0.3159458637237549, "lr": 8.245912885069531e-05, "epoch": 3.48, "percentage": 34.8, "elapsed_time": "0:58:33", "remaining_time": "1:49:42"}
96
+ {"current_steps": 880, "total_steps": 2500, "loss": 0.2982481002807617, "lr": 8.192496037475562e-05, "epoch": 3.52, "percentage": 35.2, "elapsed_time": "0:59:13", "remaining_time": "1:49:01"}
97
+ {"current_steps": 890, "total_steps": 2500, "loss": 0.3232215404510498, "lr": 8.138456806453503e-05, "epoch": 3.56, "percentage": 35.6, "elapsed_time": "0:59:53", "remaining_time": "1:48:20"}
98
+ {"current_steps": 900, "total_steps": 2500, "loss": 0.3305091381072998, "lr": 8.083805727058513e-05, "epoch": 3.6, "percentage": 36.0, "elapsed_time": "1:00:32", "remaining_time": "1:47:38"}
99
+ {"current_steps": 900, "total_steps": 2500, "eval_loss": 0.44760578870773315, "epoch": 3.6, "percentage": 36.0, "elapsed_time": "1:00:52", "remaining_time": "1:48:13"}
100
+ {"current_steps": 910, "total_steps": 2500, "loss": 0.35752732753753663, "lr": 8.028553453626808e-05, "epoch": 3.64, "percentage": 36.4, "elapsed_time": "1:01:35", "remaining_time": "1:47:37"}
101
+ {"current_steps": 920, "total_steps": 2500, "loss": 0.3292932271957397, "lr": 7.972710757698567e-05, "epoch": 3.68, "percentage": 36.8, "elapsed_time": "1:02:16", "remaining_time": "1:46:56"}
102
+ {"current_steps": 930, "total_steps": 2500, "loss": 0.28986682891845705, "lr": 7.916288525918007e-05, "epoch": 3.7199999999999998, "percentage": 37.2, "elapsed_time": "1:02:55", "remaining_time": "1:46:13"}
103
+ {"current_steps": 940, "total_steps": 2500, "loss": 0.3027395725250244, "lr": 7.859297757911013e-05, "epoch": 3.76, "percentage": 37.6, "elapsed_time": "1:03:33", "remaining_time": "1:45:28"}
104
+ {"current_steps": 950, "total_steps": 2500, "loss": 0.3238774061203003, "lr": 7.801749564140724e-05, "epoch": 3.8, "percentage": 38.0, "elapsed_time": "1:04:12", "remaining_time": "1:44:46"}
105
+ {"current_steps": 960, "total_steps": 2500, "loss": 0.34537086486816404, "lr": 7.743655163741543e-05, "epoch": 3.84, "percentage": 38.4, "elapsed_time": "1:04:51", "remaining_time": "1:44:02"}
106
+ {"current_steps": 970, "total_steps": 2500, "loss": 0.3292637825012207, "lr": 7.685025882331936e-05, "epoch": 3.88, "percentage": 38.8, "elapsed_time": "1:05:30", "remaining_time": "1:43:20"}
107
+ {"current_steps": 980, "total_steps": 2500, "loss": 0.32722015380859376, "lr": 7.62587314980648e-05, "epoch": 3.92, "percentage": 39.2, "elapsed_time": "1:06:11", "remaining_time": "1:42:39"}
108
+ {"current_steps": 990, "total_steps": 2500, "loss": 0.29880056381225584, "lr": 7.566208498107585e-05, "epoch": 3.96, "percentage": 39.6, "elapsed_time": "1:06:50", "remaining_time": "1:41:56"}
109
+ {"current_steps": 1000, "total_steps": 2500, "loss": 0.2978524684906006, "lr": 7.506043558977321e-05, "epoch": 4.0, "percentage": 40.0, "elapsed_time": "1:07:29", "remaining_time": "1:41:14"}
110
+ {"current_steps": 1000, "total_steps": 2500, "eval_loss": 0.44613513350486755, "epoch": 4.0, "percentage": 40.0, "elapsed_time": "1:07:49", "remaining_time": "1:41:43"}
111
+ {"current_steps": 1010, "total_steps": 2500, "loss": 0.27530927658081056, "lr": 7.445390061689782e-05, "epoch": 4.04, "percentage": 40.4, "elapsed_time": "1:08:31", "remaining_time": "1:41:05"}
112
+ {"current_steps": 1020, "total_steps": 2500, "loss": 0.2517704486846924, "lr": 7.38425983076444e-05, "epoch": 4.08, "percentage": 40.8, "elapsed_time": "1:09:11", "remaining_time": "1:40:23"}
113
+ {"current_steps": 1030, "total_steps": 2500, "loss": 0.28200175762176516, "lr": 7.32266478366094e-05, "epoch": 4.12, "percentage": 41.2, "elapsed_time": "1:09:50", "remaining_time": "1:39:40"}
114
+ {"current_steps": 1040, "total_steps": 2500, "loss": 0.2569046258926392, "lr": 7.260616928455754e-05, "epoch": 4.16, "percentage": 41.6, "elapsed_time": "1:10:30", "remaining_time": "1:38:58"}
115
+ {"current_steps": 1050, "total_steps": 2500, "loss": 0.2665576696395874, "lr": 7.1981283615012e-05, "epoch": 4.2, "percentage": 42.0, "elapsed_time": "1:11:09", "remaining_time": "1:38:15"}
116
+ {"current_steps": 1060, "total_steps": 2500, "loss": 0.2635650634765625, "lr": 7.135211265067216e-05, "epoch": 4.24, "percentage": 42.4, "elapsed_time": "1:11:47", "remaining_time": "1:37:31"}
117
+ {"current_steps": 1070, "total_steps": 2500, "loss": 0.26842334270477297, "lr": 7.071877904966423e-05, "epoch": 4.28, "percentage": 42.8, "elapsed_time": "1:12:27", "remaining_time": "1:36:49"}
118
+ {"current_steps": 1080, "total_steps": 2500, "loss": 0.2633937358856201, "lr": 7.00814062816285e-05, "epoch": 4.32, "percentage": 43.2, "elapsed_time": "1:13:07", "remaining_time": "1:36:08"}
119
+ {"current_steps": 1090, "total_steps": 2500, "loss": 0.2895397186279297, "lr": 6.944011860364905e-05, "epoch": 4.36, "percentage": 43.6, "elapsed_time": "1:13:45", "remaining_time": "1:35:24"}
120
+ {"current_steps": 1100, "total_steps": 2500, "loss": 0.27405414581298826, "lr": 6.879504103602935e-05, "epoch": 4.4, "percentage": 44.0, "elapsed_time": "1:14:25", "remaining_time": "1:34:43"}
121
+ {"current_steps": 1100, "total_steps": 2500, "eval_loss": 0.46795058250427246, "epoch": 4.4, "percentage": 44.0, "elapsed_time": "1:14:42", "remaining_time": "1:35:05"}
122
+ {"current_steps": 1110, "total_steps": 2500, "loss": 0.2581511974334717, "lr": 6.814629933791931e-05, "epoch": 4.44, "percentage": 44.4, "elapsed_time": "1:15:21", "remaining_time": "1:34:22"}
123
+ {"current_steps": 1120, "total_steps": 2500, "loss": 0.2689012050628662, "lr": 6.749401998279846e-05, "epoch": 4.48, "percentage": 44.8, "elapsed_time": "1:16:00", "remaining_time": "1:33:39"}
124
+ {"current_steps": 1130, "total_steps": 2500, "loss": 0.27230424880981446, "lr": 6.683833013381941e-05, "epoch": 4.52, "percentage": 45.2, "elapsed_time": "1:16:39", "remaining_time": "1:32:56"}
125
+ {"current_steps": 1140, "total_steps": 2500, "loss": 0.2903036594390869, "lr": 6.617935761901748e-05, "epoch": 4.5600000000000005, "percentage": 45.6, "elapsed_time": "1:17:18", "remaining_time": "1:32:13"}
126
+ {"current_steps": 1150, "total_steps": 2500, "loss": 0.2551115989685059, "lr": 6.551723090639007e-05, "epoch": 4.6, "percentage": 46.0, "elapsed_time": "1:17:57", "remaining_time": "1:31:31"}
127
+ {"current_steps": 1160, "total_steps": 2500, "loss": 0.2783109188079834, "lr": 6.485207907885175e-05, "epoch": 4.64, "percentage": 46.4, "elapsed_time": "1:18:37", "remaining_time": "1:30:49"}
128
+ {"current_steps": 1170, "total_steps": 2500, "loss": 0.29131503105163575, "lr": 6.418403180906922e-05, "epoch": 4.68, "percentage": 46.8, "elapsed_time": "1:19:17", "remaining_time": "1:30:08"}
129
+ {"current_steps": 1180, "total_steps": 2500, "loss": 0.2730400085449219, "lr": 6.351321933418139e-05, "epoch": 4.72, "percentage": 47.2, "elapsed_time": "1:19:57", "remaining_time": "1:29:26"}
130
+ {"current_steps": 1190, "total_steps": 2500, "loss": 0.2572148323059082, "lr": 6.283977243040939e-05, "epoch": 4.76, "percentage": 47.6, "elapsed_time": "1:20:37", "remaining_time": "1:28:44"}
131
+ {"current_steps": 1200, "total_steps": 2500, "loss": 0.27444655895233155, "lr": 6.216382238756146e-05, "epoch": 4.8, "percentage": 48.0, "elapsed_time": "1:21:16", "remaining_time": "1:28:03"}
132
+ {"current_steps": 1200, "total_steps": 2500, "eval_loss": 0.466619610786438, "epoch": 4.8, "percentage": 48.0, "elapsed_time": "1:21:36", "remaining_time": "1:28:24"}
133
+ {"current_steps": 1210, "total_steps": 2500, "loss": 0.27054529190063475, "lr": 6.148550098343778e-05, "epoch": 4.84, "percentage": 48.4, "elapsed_time": "1:22:17", "remaining_time": "1:27:43"}
134
+ {"current_steps": 1220, "total_steps": 2500, "loss": 0.26785056591033934, "lr": 6.080494045814011e-05, "epoch": 4.88, "percentage": 48.8, "elapsed_time": "1:22:55", "remaining_time": "1:26:59"}
135
+ {"current_steps": 1230, "total_steps": 2500, "loss": 0.26335647106170657, "lr": 6.0122273488291304e-05, "epoch": 4.92, "percentage": 49.2, "elapsed_time": "1:23:34", "remaining_time": "1:26:17"}
136
+ {"current_steps": 1240, "total_steps": 2500, "loss": 0.2614041090011597, "lr": 5.943763316116977e-05, "epoch": 4.96, "percentage": 49.6, "elapsed_time": "1:24:13", "remaining_time": "1:25:34"}
137
+ {"current_steps": 1250, "total_steps": 2500, "loss": 0.24768717288970948, "lr": 5.875115294876381e-05, "epoch": 5.0, "percentage": 50.0, "elapsed_time": "1:24:52", "remaining_time": "1:24:52"}
138
+ {"current_steps": 1260, "total_steps": 2500, "loss": 0.21707432270050048, "lr": 5.806296668175104e-05, "epoch": 5.04, "percentage": 50.4, "elapsed_time": "1:25:31", "remaining_time": "1:24:10"}
139
+ {"current_steps": 1270, "total_steps": 2500, "loss": 0.2139519214630127, "lr": 5.737320852340775e-05, "epoch": 5.08, "percentage": 50.8, "elapsed_time": "1:26:06", "remaining_time": "1:23:23"}
140
+ {"current_steps": 1280, "total_steps": 2500, "loss": 0.20998594760894776, "lr": 5.668201294345363e-05, "epoch": 5.12, "percentage": 51.2, "elapsed_time": "1:26:39", "remaining_time": "1:22:35"}
141
+ {"current_steps": 1290, "total_steps": 2500, "loss": 0.23306002616882324, "lr": 5.598951469183649e-05, "epoch": 5.16, "percentage": 51.6, "elapsed_time": "1:27:14", "remaining_time": "1:21:49"}
142
+ {"current_steps": 1300, "total_steps": 2500, "loss": 0.2262401580810547, "lr": 5.52958487724626e-05, "epoch": 5.2, "percentage": 52.0, "elapsed_time": "1:27:49", "remaining_time": "1:21:04"}
143
+ {"current_steps": 1300, "total_steps": 2500, "eval_loss": 0.49972543120384216, "epoch": 5.2, "percentage": 52.0, "elapsed_time": "1:28:08", "remaining_time": "1:21:22"}
144
+ {"current_steps": 1310, "total_steps": 2500, "loss": 0.21100988388061523, "lr": 5.4601150416877367e-05, "epoch": 5.24, "percentage": 52.4, "elapsed_time": "1:28:47", "remaining_time": "1:20:39"}
145
+ {"current_steps": 1320, "total_steps": 2500, "loss": 0.23542592525482178, "lr": 5.390555505790168e-05, "epoch": 5.28, "percentage": 52.8, "elapsed_time": "1:29:20", "remaining_time": "1:19:52"}
146
+ {"current_steps": 1330, "total_steps": 2500, "loss": 0.2095633029937744, "lr": 5.3209198303229027e-05, "epoch": 5.32, "percentage": 53.2, "elapsed_time": "1:29:54", "remaining_time": "1:19:05"}
147
+ {"current_steps": 1340, "total_steps": 2500, "loss": 0.21693904399871827, "lr": 5.2512215908988484e-05, "epoch": 5.36, "percentage": 53.6, "elapsed_time": "1:30:31", "remaining_time": "1:18:21"}
148
+ {"current_steps": 1350, "total_steps": 2500, "loss": 0.2076347827911377, "lr": 5.1814743753278795e-05, "epoch": 5.4, "percentage": 54.0, "elapsed_time": "1:31:11", "remaining_time": "1:17:40"}
149
+ {"current_steps": 1360, "total_steps": 2500, "loss": 0.22539749145507812, "lr": 5.111691780967869e-05, "epoch": 5.44, "percentage": 54.4, "elapsed_time": "1:31:50", "remaining_time": "1:16:58"}
150
+ {"current_steps": 1370, "total_steps": 2500, "loss": 0.2077547550201416, "lr": 5.041887412073854e-05, "epoch": 5.48, "percentage": 54.8, "elapsed_time": "1:32:27", "remaining_time": "1:16:15"}
151
+ {"current_steps": 1380, "total_steps": 2500, "loss": 0.21558783054351807, "lr": 4.97207487714586e-05, "epoch": 5.52, "percentage": 55.2, "elapsed_time": "1:33:04", "remaining_time": "1:15:32"}
152
+ {"current_steps": 1390, "total_steps": 2500, "loss": 0.21069679260253907, "lr": 4.9022677862758945e-05, "epoch": 5.5600000000000005, "percentage": 55.6, "elapsed_time": "1:33:39", "remaining_time": "1:14:47"}
153
+ {"current_steps": 1400, "total_steps": 2500, "loss": 0.21843309402465821, "lr": 4.832479748494643e-05, "epoch": 5.6, "percentage": 56.0, "elapsed_time": "1:34:13", "remaining_time": "1:14:02"}
154
+ {"current_steps": 1400, "total_steps": 2500, "eval_loss": 0.49576279520988464, "epoch": 5.6, "percentage": 56.0, "elapsed_time": "1:34:31", "remaining_time": "1:14:16"}
155
+ {"current_steps": 1410, "total_steps": 2500, "loss": 0.22310276031494142, "lr": 4.7627243691183453e-05, "epoch": 5.64, "percentage": 56.4, "elapsed_time": "1:35:09", "remaining_time": "1:13:33"}
156
+ {"current_steps": 1420, "total_steps": 2500, "loss": 0.22056117057800292, "lr": 4.693015247096423e-05, "epoch": 5.68, "percentage": 56.8, "elapsed_time": "1:35:47", "remaining_time": "1:12:51"}
157
+ {"current_steps": 1430, "total_steps": 2500, "loss": 0.2241537094116211, "lr": 4.623365972360337e-05, "epoch": 5.72, "percentage": 57.2, "elapsed_time": "1:36:25", "remaining_time": "1:12:09"}
158
+ {"current_steps": 1440, "total_steps": 2500, "loss": 0.21514451503753662, "lr": 4.553790123174197e-05, "epoch": 5.76, "percentage": 57.6, "elapsed_time": "1:37:04", "remaining_time": "1:11:27"}
159
+ {"current_steps": 1450, "total_steps": 2500, "loss": 0.21031346321105956, "lr": 4.484301263487665e-05, "epoch": 5.8, "percentage": 58.0, "elapsed_time": "1:37:42", "remaining_time": "1:10:45"}
160
+ {"current_steps": 1460, "total_steps": 2500, "loss": 0.2312474489212036, "lr": 4.414912940291613e-05, "epoch": 5.84, "percentage": 58.4, "elapsed_time": "1:38:21", "remaining_time": "1:10:03"}
161
+ {"current_steps": 1470, "total_steps": 2500, "loss": 0.22380952835083007, "lr": 4.345638680977139e-05, "epoch": 5.88, "percentage": 58.8, "elapsed_time": "1:39:00", "remaining_time": "1:09:22"}
162
+ {"current_steps": 1480, "total_steps": 2500, "loss": 0.22706894874572753, "lr": 4.276491990698355e-05, "epoch": 5.92, "percentage": 59.2, "elapsed_time": "1:39:39", "remaining_time": "1:08:40"}
163
+ {"current_steps": 1490, "total_steps": 2500, "loss": 0.2103546142578125, "lr": 4.2074863497395377e-05, "epoch": 5.96, "percentage": 59.6, "elapsed_time": "1:40:15", "remaining_time": "1:07:57"}
164
+ {"current_steps": 1500, "total_steps": 2500, "loss": 0.2276217222213745, "lr": 4.1386352108871174e-05, "epoch": 6.0, "percentage": 60.0, "elapsed_time": "1:40:50", "remaining_time": "1:07:13"}
165
+ {"current_steps": 1500, "total_steps": 2500, "eval_loss": 0.4966464042663574, "epoch": 6.0, "percentage": 60.0, "elapsed_time": "1:41:07", "remaining_time": "1:07:25"}
166
+ {"current_steps": 1510, "total_steps": 2500, "loss": 0.16540236473083497, "lr": 4.069951996807034e-05, "epoch": 6.04, "percentage": 60.4, "elapsed_time": "1:41:49", "remaining_time": "1:06:45"}
167
+ {"current_steps": 1520, "total_steps": 2500, "loss": 0.1638352394104004, "lr": 4.001450097427966e-05, "epoch": 6.08, "percentage": 60.8, "elapsed_time": "1:42:26", "remaining_time": "1:06:03"}
168
+ {"current_steps": 1530, "total_steps": 2500, "loss": 0.1719011664390564, "lr": 3.9331428673309204e-05, "epoch": 6.12, "percentage": 61.2, "elapsed_time": "1:43:05", "remaining_time": "1:05:21"}
169
+ {"current_steps": 1540, "total_steps": 2500, "loss": 0.1651092290878296, "lr": 3.865043623145751e-05, "epoch": 6.16, "percentage": 61.6, "elapsed_time": "1:43:41", "remaining_time": "1:04:38"}
170
+ {"current_steps": 1550, "total_steps": 2500, "loss": 0.1746900796890259, "lr": 3.797165640955041e-05, "epoch": 6.2, "percentage": 62.0, "elapsed_time": "1:44:21", "remaining_time": "1:03:57"}
171
+ {"current_steps": 1560, "total_steps": 2500, "loss": 0.16637682914733887, "lr": 3.729522153705916e-05, "epoch": 6.24, "percentage": 62.4, "elapsed_time": "1:44:59", "remaining_time": "1:03:15"}
172
+ {"current_steps": 1570, "total_steps": 2500, "loss": 0.1709848165512085, "lr": 3.662126348630237e-05, "epoch": 6.28, "percentage": 62.8, "elapsed_time": "1:45:38", "remaining_time": "1:02:34"}
173
+ {"current_steps": 1580, "total_steps": 2500, "loss": 0.18107957839965821, "lr": 3.594991364673745e-05, "epoch": 6.32, "percentage": 63.2, "elapsed_time": "1:46:17", "remaining_time": "1:01:53"}
174
+ {"current_steps": 1590, "total_steps": 2500, "loss": 0.16225044727325438, "lr": 3.528130289934583e-05, "epoch": 6.36, "percentage": 63.6, "elapsed_time": "1:46:54", "remaining_time": "1:01:11"}
175
+ {"current_steps": 1600, "total_steps": 2500, "loss": 0.17544152736663818, "lr": 3.461556159111748e-05, "epoch": 6.4, "percentage": 64.0, "elapsed_time": "1:47:32", "remaining_time": "1:00:29"}
176
+ {"current_steps": 1600, "total_steps": 2500, "eval_loss": 0.5342507362365723, "epoch": 6.4, "percentage": 64.0, "elapsed_time": "1:47:52", "remaining_time": "1:00:40"}
177
+ {"current_steps": 1610, "total_steps": 2500, "loss": 0.17091144323349, "lr": 3.3952819509639534e-05, "epoch": 6.44, "percentage": 64.4, "elapsed_time": "1:48:32", "remaining_time": "0:59:59"}
178
+ {"current_steps": 1620, "total_steps": 2500, "loss": 0.17765278816223146, "lr": 3.329320585779393e-05, "epoch": 6.48, "percentage": 64.8, "elapsed_time": "1:49:11", "remaining_time": "0:59:18"}
179
+ {"current_steps": 1630, "total_steps": 2500, "loss": 0.16475566625595092, "lr": 3.263684922856905e-05, "epoch": 6.52, "percentage": 65.2, "elapsed_time": "1:49:51", "remaining_time": "0:58:38"}
180
+ {"current_steps": 1640, "total_steps": 2500, "loss": 0.172060227394104, "lr": 3.1983877579990274e-05, "epoch": 6.5600000000000005, "percentage": 65.6, "elapsed_time": "1:50:30", "remaining_time": "0:57:57"}
181
+ {"current_steps": 1650, "total_steps": 2500, "loss": 0.16673840284347535, "lr": 3.1334418210174263e-05, "epoch": 6.6, "percentage": 66.0, "elapsed_time": "1:51:09", "remaining_time": "0:57:15"}
182
+ {"current_steps": 1660, "total_steps": 2500, "loss": 0.17414634227752684, "lr": 3.0688597732512e-05, "epoch": 6.64, "percentage": 66.4, "elapsed_time": "1:51:47", "remaining_time": "0:56:34"}
183
+ {"current_steps": 1670, "total_steps": 2500, "loss": 0.1620783567428589, "lr": 3.0046542050985237e-05, "epoch": 6.68, "percentage": 66.8, "elapsed_time": "1:52:28", "remaining_time": "0:55:53"}
184
+ {"current_steps": 1680, "total_steps": 2500, "loss": 0.17428462505340575, "lr": 2.940837633562127e-05, "epoch": 6.72, "percentage": 67.2, "elapsed_time": "1:53:08", "remaining_time": "0:55:13"}
185
+ {"current_steps": 1690, "total_steps": 2500, "loss": 0.19050977230072022, "lr": 2.877422499809072e-05, "epoch": 6.76, "percentage": 67.6, "elapsed_time": "1:53:46", "remaining_time": "0:54:31"}
186
+ {"current_steps": 1700, "total_steps": 2500, "loss": 0.16926174163818358, "lr": 2.8144211667453368e-05, "epoch": 6.8, "percentage": 68.0, "elapsed_time": "1:54:23", "remaining_time": "0:53:50"}
187
+ {"current_steps": 1700, "total_steps": 2500, "eval_loss": 0.5441356301307678, "epoch": 6.8, "percentage": 68.0, "elapsed_time": "1:54:41", "remaining_time": "0:53:58"}
188
+ {"current_steps": 1710, "total_steps": 2500, "loss": 0.1793771743774414, "lr": 2.75184591660563e-05, "epoch": 6.84, "percentage": 68.4, "elapsed_time": "1:55:20", "remaining_time": "0:53:17"}
189
+ {"current_steps": 1720, "total_steps": 2500, "loss": 0.1647491931915283, "lr": 2.6897089485589583e-05, "epoch": 6.88, "percentage": 68.8, "elapsed_time": "1:55:56", "remaining_time": "0:52:34"}
190
+ {"current_steps": 1730, "total_steps": 2500, "loss": 0.17397019863128663, "lr": 2.6280223763303546e-05, "epoch": 6.92, "percentage": 69.2, "elapsed_time": "1:56:35", "remaining_time": "0:51:53"}
191
+ {"current_steps": 1740, "total_steps": 2500, "loss": 0.17107686996459961, "lr": 2.5667982258393014e-05, "epoch": 6.96, "percentage": 69.6, "elapsed_time": "1:57:11", "remaining_time": "0:51:11"}
192
+ {"current_steps": 1750, "total_steps": 2500, "loss": 0.1730511426925659, "lr": 2.506048432855247e-05, "epoch": 7.0, "percentage": 70.0, "elapsed_time": "1:57:50", "remaining_time": "0:50:30"}
193
+ {"current_steps": 1760, "total_steps": 2500, "loss": 0.13950222730636597, "lr": 2.4457848406707013e-05, "epoch": 7.04, "percentage": 70.4, "elapsed_time": "1:58:29", "remaining_time": "0:49:49"}
194
+ {"current_steps": 1770, "total_steps": 2500, "loss": 0.1326605796813965, "lr": 2.3860191977923672e-05, "epoch": 7.08, "percentage": 70.8, "elapsed_time": "1:59:08", "remaining_time": "0:49:08"}
195
+ {"current_steps": 1780, "total_steps": 2500, "loss": 0.1265331983566284, "lr": 2.326763155650744e-05, "epoch": 7.12, "percentage": 71.2, "elapsed_time": "1:59:47", "remaining_time": "0:48:27"}
196
+ {"current_steps": 1790, "total_steps": 2500, "loss": 0.12731509208679198, "lr": 2.2680282663286552e-05, "epoch": 7.16, "percentage": 71.6, "elapsed_time": "2:00:26", "remaining_time": "0:47:46"}
197
+ {"current_steps": 1800, "total_steps": 2500, "loss": 0.13114826679229735, "lr": 2.209825980309151e-05, "epoch": 7.2, "percentage": 72.0, "elapsed_time": "2:01:03", "remaining_time": "0:47:04"}
198
+ {"current_steps": 1800, "total_steps": 2500, "eval_loss": 0.5847110748291016, "epoch": 7.2, "percentage": 72.0, "elapsed_time": "2:01:22", "remaining_time": "0:47:12"}
199
+ {"current_steps": 1810, "total_steps": 2500, "loss": 0.12906957864761354, "lr": 2.152167644243213e-05, "epoch": 7.24, "percentage": 72.4, "elapsed_time": "2:02:00", "remaining_time": "0:46:30"}
200
+ {"current_steps": 1820, "total_steps": 2500, "loss": 0.133590030670166, "lr": 2.095064498737701e-05, "epoch": 7.28, "percentage": 72.8, "elapsed_time": "2:02:38", "remaining_time": "0:45:49"}
201
+ {"current_steps": 1830, "total_steps": 2500, "loss": 0.13653848171234131, "lr": 2.0385276761639765e-05, "epoch": 7.32, "percentage": 73.2, "elapsed_time": "2:03:18", "remaining_time": "0:45:08"}
202
+ {"current_steps": 1840, "total_steps": 2500, "loss": 0.12472724914550781, "lr": 1.9825681984876172e-05, "epoch": 7.36, "percentage": 73.6, "elapsed_time": "2:03:56", "remaining_time": "0:44:27"}
203
+ {"current_steps": 1850, "total_steps": 2500, "loss": 0.13255125284194946, "lr": 1.9271969751196776e-05, "epoch": 7.4, "percentage": 74.0, "elapsed_time": "2:04:32", "remaining_time": "0:43:45"}
204
+ {"current_steps": 1860, "total_steps": 2500, "loss": 0.13693161010742189, "lr": 1.8724248007898647e-05, "epoch": 7.44, "percentage": 74.4, "elapsed_time": "2:05:09", "remaining_time": "0:43:03"}
205
+ {"current_steps": 1870, "total_steps": 2500, "loss": 0.13425672054290771, "lr": 1.8182623534420907e-05, "epoch": 7.48, "percentage": 74.8, "elapsed_time": "2:05:43", "remaining_time": "0:42:21"}
206
+ {"current_steps": 1880, "total_steps": 2500, "loss": 0.13668575286865234, "lr": 1.76472019215278e-05, "epoch": 7.52, "percentage": 75.2, "elapsed_time": "2:06:19", "remaining_time": "0:41:39"}
207
+ {"current_steps": 1890, "total_steps": 2500, "loss": 0.1317702889442444, "lr": 1.7118087550723633e-05, "epoch": 7.5600000000000005, "percentage": 75.6, "elapsed_time": "2:06:53", "remaining_time": "0:40:57"}
208
+ {"current_steps": 1900, "total_steps": 2500, "loss": 0.14458621740341188, "lr": 1.659538357390341e-05, "epoch": 7.6, "percentage": 76.0, "elapsed_time": "2:07:26", "remaining_time": "0:40:14"}
209
+ {"current_steps": 1900, "total_steps": 2500, "eval_loss": 0.5830516219139099, "epoch": 7.6, "percentage": 76.0, "elapsed_time": "2:07:44", "remaining_time": "0:40:20"}
210
+ {"current_steps": 1910, "total_steps": 2500, "loss": 0.13126691579818725, "lr": 1.60791918932431e-05, "epoch": 7.64, "percentage": 76.4, "elapsed_time": "2:08:25", "remaining_time": "0:39:40"}
211
+ {"current_steps": 1920, "total_steps": 2500, "loss": 0.12600460052490234, "lr": 1.556961314133359e-05, "epoch": 7.68, "percentage": 76.8, "elapsed_time": "2:09:04", "remaining_time": "0:38:59"}
212
+ {"current_steps": 1930, "total_steps": 2500, "loss": 0.12453792095184327, "lr": 1.5066746661562253e-05, "epoch": 7.72, "percentage": 77.2, "elapsed_time": "2:09:40", "remaining_time": "0:38:17"}
213
+ {"current_steps": 1940, "total_steps": 2500, "loss": 0.14839541912078857, "lr": 1.4570690488745687e-05, "epoch": 7.76, "percentage": 77.6, "elapsed_time": "2:10:17", "remaining_time": "0:37:36"}
214
+ {"current_steps": 1950, "total_steps": 2500, "loss": 0.1321096420288086, "lr": 1.4081541330017705e-05, "epoch": 7.8, "percentage": 78.0, "elapsed_time": "2:10:53", "remaining_time": "0:36:55"}
215
+ {"current_steps": 1960, "total_steps": 2500, "loss": 0.1317069411277771, "lr": 1.3599394545975951e-05, "epoch": 7.84, "percentage": 78.4, "elapsed_time": "2:11:29", "remaining_time": "0:36:13"}
216
+ {"current_steps": 1970, "total_steps": 2500, "loss": 0.13362932205200195, "lr": 1.312434413209131e-05, "epoch": 7.88, "percentage": 78.8, "elapsed_time": "2:12:06", "remaining_time": "0:35:32"}
217
+ {"current_steps": 1980, "total_steps": 2500, "loss": 0.12677763700485228, "lr": 1.2656482700383237e-05, "epoch": 7.92, "percentage": 79.2, "elapsed_time": "2:12:43", "remaining_time": "0:34:51"}
218
+ {"current_steps": 1990, "total_steps": 2500, "loss": 0.1382434129714966, "lr": 1.219590146136485e-05, "epoch": 7.96, "percentage": 79.6, "elapsed_time": "2:13:22", "remaining_time": "0:34:10"}
219
+ {"current_steps": 2000, "total_steps": 2500, "loss": 0.12519369125366211, "lr": 1.1742690206261292e-05, "epoch": 8.0, "percentage": 80.0, "elapsed_time": "2:13:59", "remaining_time": "0:33:29"}
220
+ {"current_steps": 2000, "total_steps": 2500, "eval_loss": 0.5840195417404175, "epoch": 8.0, "percentage": 80.0, "elapsed_time": "2:14:18", "remaining_time": "0:33:34"}
221
+ {"current_steps": 2010, "total_steps": 2500, "loss": 0.10409053564071655, "lr": 1.129693728950474e-05, "epoch": 8.04, "percentage": 80.4, "elapsed_time": "2:15:02", "remaining_time": "0:32:55"}
222
+ {"current_steps": 2020, "total_steps": 2500, "loss": 0.10310100317001343, "lr": 1.0858729611509516e-05, "epoch": 8.08, "percentage": 80.8, "elapsed_time": "2:15:43", "remaining_time": "0:32:15"}
223
+ {"current_steps": 2030, "total_steps": 2500, "loss": 0.09960774183273316, "lr": 1.0428152601730718e-05, "epoch": 8.12, "percentage": 81.2, "elapsed_time": "2:16:20", "remaining_time": "0:31:34"}
224
+ {"current_steps": 2040, "total_steps": 2500, "loss": 0.09982571601867676, "lr": 1.0005290202009531e-05, "epoch": 8.16, "percentage": 81.6, "elapsed_time": "2:16:58", "remaining_time": "0:30:53"}
225
+ {"current_steps": 2050, "total_steps": 2500, "loss": 0.11322143077850341, "lr": 9.590224850208646e-06, "epoch": 8.2, "percentage": 82.0, "elapsed_time": "2:17:36", "remaining_time": "0:30:12"}
226
+ {"current_steps": 2060, "total_steps": 2500, "loss": 0.10006082057952881, "lr": 9.183037464140804e-06, "epoch": 8.24, "percentage": 82.4, "elapsed_time": "2:18:13", "remaining_time": "0:29:31"}
227
+ {"current_steps": 2070, "total_steps": 2500, "loss": 0.11560235023498536, "lr": 8.783807425793721e-06, "epoch": 8.28, "percentage": 82.8, "elapsed_time": "2:18:46", "remaining_time": "0:28:49"}
228
+ {"current_steps": 2080, "total_steps": 2500, "loss": 0.10931503772735596, "lr": 8.392612565854375e-06, "epoch": 8.32, "percentage": 83.2, "elapsed_time": "2:19:21", "remaining_time": "0:28:08"}
229
+ {"current_steps": 2090, "total_steps": 2500, "loss": 0.10900030136108399, "lr": 8.009529148535855e-06, "epoch": 8.36, "percentage": 83.6, "elapsed_time": "2:19:58", "remaining_time": "0:27:27"}
230
+ {"current_steps": 2100, "total_steps": 2500, "loss": 0.1069128155708313, "lr": 7.63463185670939e-06, "epoch": 8.4, "percentage": 84.0, "elapsed_time": "2:20:34", "remaining_time": "0:26:46"}
231
+ {"current_steps": 2100, "total_steps": 2500, "eval_loss": 0.6247864961624146, "epoch": 8.4, "percentage": 84.0, "elapsed_time": "2:20:53", "remaining_time": "0:26:50"}
232
+ {"current_steps": 2110, "total_steps": 2500, "loss": 0.09856721758842468, "lr": 7.267993777344856e-06, "epoch": 8.44, "percentage": 84.4, "elapsed_time": "2:21:31", "remaining_time": "0:26:09"}
233
+ {"current_steps": 2120, "total_steps": 2500, "loss": 0.10609345436096192, "lr": 6.909686387262254e-06, "epoch": 8.48, "percentage": 84.8, "elapsed_time": "2:22:07", "remaining_time": "0:25:28"}
234
+ {"current_steps": 2130, "total_steps": 2500, "loss": 0.105103600025177, "lr": 6.559779539197231e-06, "epoch": 8.52, "percentage": 85.2, "elapsed_time": "2:22:41", "remaining_time": "0:24:47"}
235
+ {"current_steps": 2140, "total_steps": 2500, "loss": 0.10853493213653564, "lr": 6.21834144818314e-06, "epoch": 8.56, "percentage": 85.6, "elapsed_time": "2:23:18", "remaining_time": "0:24:06"}
236
+ {"current_steps": 2150, "total_steps": 2500, "loss": 0.11464111804962158, "lr": 5.885438678252342e-06, "epoch": 8.6, "percentage": 86.0, "elapsed_time": "2:23:56", "remaining_time": "0:23:25"}
237
+ {"current_steps": 2160, "total_steps": 2500, "loss": 0.10765299797058106, "lr": 5.5611361294594325e-06, "epoch": 8.64, "percentage": 86.4, "elapsed_time": "2:24:33", "remaining_time": "0:22:45"}
238
+ {"current_steps": 2170, "total_steps": 2500, "loss": 0.10699164867401123, "lr": 5.245497025228874e-06, "epoch": 8.68, "percentage": 86.8, "elapsed_time": "2:25:12", "remaining_time": "0:22:04"}
239
+ {"current_steps": 2180, "total_steps": 2500, "loss": 0.10728691816329956, "lr": 4.938582900029437e-06, "epoch": 8.72, "percentage": 87.2, "elapsed_time": "2:25:52", "remaining_time": "0:21:24"}
240
+ {"current_steps": 2190, "total_steps": 2500, "loss": 0.11177785396575927, "lr": 4.640453587377957e-06, "epoch": 8.76, "percentage": 87.6, "elapsed_time": "2:26:32", "remaining_time": "0:20:44"}
241
+ {"current_steps": 2200, "total_steps": 2500, "loss": 0.11041848659515381, "lr": 4.351167208174639e-06, "epoch": 8.8, "percentage": 88.0, "elapsed_time": "2:27:11", "remaining_time": "0:20:04"}
242
+ {"current_steps": 2200, "total_steps": 2500, "eval_loss": 0.6235533356666565, "epoch": 8.8, "percentage": 88.0, "elapsed_time": "2:27:30", "remaining_time": "0:20:06"}
243
+ {"current_steps": 2210, "total_steps": 2500, "loss": 0.1085782766342163, "lr": 4.0707801593723e-06, "epoch": 8.84, "percentage": 88.4, "elapsed_time": "2:28:10", "remaining_time": "0:19:26"}
244
+ {"current_steps": 2220, "total_steps": 2500, "loss": 0.11138873100280762, "lr": 3.799347102981665e-06, "epoch": 8.88, "percentage": 88.8, "elapsed_time": "2:28:48", "remaining_time": "0:18:46"}
245
+ {"current_steps": 2230, "total_steps": 2500, "loss": 0.10770895481109619, "lr": 3.536920955414885e-06, "epoch": 8.92, "percentage": 89.2, "elapsed_time": "2:29:26", "remaining_time": "0:18:05"}
246
+ {"current_steps": 2240, "total_steps": 2500, "loss": 0.11167995929718018, "lr": 3.2835528771693992e-06, "epoch": 8.96, "percentage": 89.6, "elapsed_time": "2:30:02", "remaining_time": "0:17:24"}
247
+ {"current_steps": 2250, "total_steps": 2500, "loss": 0.11738998889923095, "lr": 3.039292262854088e-06, "epoch": 9.0, "percentage": 90.0, "elapsed_time": "2:30:38", "remaining_time": "0:16:44"}
248
+ {"current_steps": 2260, "total_steps": 2500, "loss": 0.10072145462036133, "lr": 2.804186731559677e-06, "epoch": 9.04, "percentage": 90.4, "elapsed_time": "2:31:15", "remaining_time": "0:16:03"}
249
+ {"current_steps": 2270, "total_steps": 2500, "loss": 0.09228388667106628, "lr": 2.5782821175753422e-06, "epoch": 9.08, "percentage": 90.8, "elapsed_time": "2:31:51", "remaining_time": "0:15:23"}
250
+ {"current_steps": 2280, "total_steps": 2500, "loss": 0.09626876711845397, "lr": 2.361622461453178e-06, "epoch": 9.12, "percentage": 91.2, "elapsed_time": "2:32:28", "remaining_time": "0:14:42"}
251
+ {"current_steps": 2290, "total_steps": 2500, "loss": 0.0960278868675232, "lr": 2.154250001422431e-06, "epoch": 9.16, "percentage": 91.6, "elapsed_time": "2:33:07", "remaining_time": "0:14:02"}
252
+ {"current_steps": 2300, "total_steps": 2500, "loss": 0.0941778838634491, "lr": 1.956205165155078e-06, "epoch": 9.2, "percentage": 92.0, "elapsed_time": "2:33:45", "remaining_time": "0:13:22"}
253
+ {"current_steps": 2300, "total_steps": 2500, "eval_loss": 0.6419874429702759, "epoch": 9.2, "percentage": 92.0, "elapsed_time": "2:34:05", "remaining_time": "0:13:23"}
254
+ {"current_steps": 2310, "total_steps": 2500, "loss": 0.09725146293640137, "lr": 1.7675265618843362e-06, "epoch": 9.24, "percentage": 92.4, "elapsed_time": "2:34:46", "remaining_time": "0:12:43"}
255
+ {"current_steps": 2320, "total_steps": 2500, "loss": 0.09353782534599304, "lr": 1.5882509748777808e-06, "epoch": 9.28, "percentage": 92.8, "elapsed_time": "2:35:21", "remaining_time": "0:12:03"}
256
+ {"current_steps": 2330, "total_steps": 2500, "loss": 0.09848537445068359, "lr": 1.4184133542663014e-06, "epoch": 9.32, "percentage": 93.2, "elapsed_time": "2:35:54", "remaining_time": "0:11:22"}
257
+ {"current_steps": 2340, "total_steps": 2500, "loss": 0.10164464712142944, "lr": 1.258046810230562e-06, "epoch": 9.36, "percentage": 93.6, "elapsed_time": "2:36:32", "remaining_time": "0:10:42"}
258
+ {"current_steps": 2350, "total_steps": 2500, "loss": 0.0934177041053772, "lr": 1.1071826065460588e-06, "epoch": 9.4, "percentage": 94.0, "elapsed_time": "2:37:08", "remaining_time": "0:10:01"}
259
+ {"current_steps": 2360, "total_steps": 2500, "loss": 0.1012031078338623, "lr": 9.65850154488218e-07, "epoch": 9.44, "percentage": 94.4, "elapsed_time": "2:37:43", "remaining_time": "0:09:21"}
260
+ {"current_steps": 2370, "total_steps": 2500, "loss": 0.09371918439865112, "lr": 8.340770070986214e-07, "epoch": 9.48, "percentage": 94.8, "elapsed_time": "2:38:22", "remaining_time": "0:08:41"}
261
+ {"current_steps": 2380, "total_steps": 2500, "loss": 0.09450345039367676, "lr": 7.11888853813436e-07, "epoch": 9.52, "percentage": 95.2, "elapsed_time": "2:39:01", "remaining_time": "0:08:01"}
262
+ {"current_steps": 2390, "total_steps": 2500, "loss": 0.09499152898788452, "lr": 5.993095154552431e-07, "epoch": 9.56, "percentage": 95.6, "elapsed_time": "2:39:40", "remaining_time": "0:07:20"}
263
+ {"current_steps": 2400, "total_steps": 2500, "loss": 0.10716021060943604, "lr": 4.963609395891299e-07, "epoch": 9.6, "percentage": 96.0, "elapsed_time": "2:40:19", "remaining_time": "0:06:40"}
264
+ {"current_steps": 2400, "total_steps": 2500, "eval_loss": 0.6402375102043152, "epoch": 9.6, "percentage": 96.0, "elapsed_time": "2:40:38", "remaining_time": "0:06:41"}
265
+ {"current_steps": 2410, "total_steps": 2500, "loss": 0.09596163630485535, "lr": 4.030631962439302e-07, "epoch": 9.64, "percentage": 96.4, "elapsed_time": "2:41:17", "remaining_time": "0:06:01"}
266
+ {"current_steps": 2420, "total_steps": 2500, "loss": 0.09645589590072631, "lr": 3.1943447399958027e-07, "epoch": 9.68, "percentage": 96.8, "elapsed_time": "2:41:52", "remaining_time": "0:05:21"}
267
+ {"current_steps": 2430, "total_steps": 2500, "loss": 0.09415926933288574, "lr": 2.4549107644117885e-07, "epoch": 9.72, "percentage": 97.2, "elapsed_time": "2:42:26", "remaining_time": "0:04:40"}
268
+ {"current_steps": 2440, "total_steps": 2500, "loss": 0.10026730298995971, "lr": 1.8124741898058462e-07, "epoch": 9.76, "percentage": 97.6, "elapsed_time": "2:42:59", "remaining_time": "0:04:00"}
269
+ {"current_steps": 2450, "total_steps": 2500, "loss": 0.09711679220199584, "lr": 1.267160260461253e-07, "epoch": 9.8, "percentage": 98.0, "elapsed_time": "2:43:33", "remaining_time": "0:03:20"}
270
+ {"current_steps": 2460, "total_steps": 2500, "loss": 0.09345818758010864, "lr": 8.190752864088436e-08, "epoch": 9.84, "percentage": 98.4, "elapsed_time": "2:44:08", "remaining_time": "0:02:40"}
271
+ {"current_steps": 2470, "total_steps": 2500, "loss": 0.102751624584198, "lr": 4.683066227023081e-08, "epoch": 9.88, "percentage": 98.8, "elapsed_time": "2:44:45", "remaining_time": "0:02:00"}
272
+ {"current_steps": 2480, "total_steps": 2500, "loss": 0.0988599717617035, "lr": 2.1492265238748366e-08, "epoch": 9.92, "percentage": 99.2, "elapsed_time": "2:45:21", "remaining_time": "0:01:20"}
273
+ {"current_steps": 2490, "total_steps": 2500, "loss": 0.09828301668167114, "lr": 5.897277317157279e-09, "epoch": 9.96, "percentage": 99.6, "elapsed_time": "2:45:56", "remaining_time": "0:00:39"}
274
+ {"current_steps": 2500, "total_steps": 2500, "loss": 0.0937616467475891, "lr": 4.873877924582715e-11, "epoch": 10.0, "percentage": 100.0, "elapsed_time": "2:46:32", "remaining_time": "0:00:00"}
275
+ {"current_steps": 2500, "total_steps": 2500, "eval_loss": 0.6409608721733093, "epoch": 10.0, "percentage": 100.0, "elapsed_time": "2:46:50", "remaining_time": "0:00:00"}
276
+ {"current_steps": 2500, "total_steps": 2500, "epoch": 10.0, "percentage": 100.0, "elapsed_time": "2:46:51", "remaining_time": "0:00:00"}
trainer_state.json ADDED
@@ -0,0 +1,1993 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 700,
3
+ "best_metric": 0.44259119033813477,
4
+ "best_model_checkpoint": "/data/taoyong/LabOS/QWEN-36/checkpoints/qwen3.6-35b-a3b-lora-lf/checkpoint-700",
5
+ "epoch": 10.0,
6
+ "eval_steps": 100,
7
+ "global_step": 2500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.04,
14
+ "grad_norm": 1.3030611276626587,
15
+ "learning_rate": 3.6e-06,
16
+ "loss": 1.1145790100097657,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.08,
21
+ "grad_norm": 1.540786623954773,
22
+ "learning_rate": 7.6e-06,
23
+ "loss": 1.2167404174804688,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.12,
28
+ "grad_norm": 1.0591915845870972,
29
+ "learning_rate": 1.16e-05,
30
+ "loss": 1.0437713623046876,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.16,
35
+ "grad_norm": 0.6695119142532349,
36
+ "learning_rate": 1.56e-05,
37
+ "loss": 0.9282869338989258,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.2,
42
+ "grad_norm": 0.7912387847900391,
43
+ "learning_rate": 1.9600000000000002e-05,
44
+ "loss": 0.8799624443054199,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.24,
49
+ "grad_norm": 0.7810359001159668,
50
+ "learning_rate": 2.36e-05,
51
+ "loss": 0.7062759399414062,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.28,
56
+ "grad_norm": 0.7185921669006348,
57
+ "learning_rate": 2.7600000000000003e-05,
58
+ "loss": 0.7228042602539062,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.32,
63
+ "grad_norm": 0.7974339723587036,
64
+ "learning_rate": 3.16e-05,
65
+ "loss": 0.6257906913757324,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.36,
70
+ "grad_norm": 0.7850703597068787,
71
+ "learning_rate": 3.56e-05,
72
+ "loss": 0.5399329185485839,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.4,
77
+ "grad_norm": 0.7295215129852295,
78
+ "learning_rate": 3.960000000000001e-05,
79
+ "loss": 0.5184461116790772,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.4,
84
+ "eval_loss": 0.5476460456848145,
85
+ "eval_runtime": 21.5181,
86
+ "eval_samples_per_second": 18.589,
87
+ "eval_steps_per_second": 3.114,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.44,
92
+ "grad_norm": 1.0682953596115112,
93
+ "learning_rate": 4.36e-05,
94
+ "loss": 0.5210700988769531,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.48,
99
+ "grad_norm": 0.9108087420463562,
100
+ "learning_rate": 4.76e-05,
101
+ "loss": 0.5155693531036377,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.52,
106
+ "grad_norm": 1.0037930011749268,
107
+ "learning_rate": 5.16e-05,
108
+ "loss": 0.45534143447875974,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.56,
113
+ "grad_norm": 0.9430785179138184,
114
+ "learning_rate": 5.560000000000001e-05,
115
+ "loss": 0.45524797439575193,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.6,
120
+ "grad_norm": 0.9689427614212036,
121
+ "learning_rate": 5.96e-05,
122
+ "loss": 0.47152209281921387,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.64,
127
+ "grad_norm": 0.7584393620491028,
128
+ "learning_rate": 6.36e-05,
129
+ "loss": 0.4532940864562988,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.68,
134
+ "grad_norm": 0.7581620216369629,
135
+ "learning_rate": 6.76e-05,
136
+ "loss": 0.48988704681396483,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.72,
141
+ "grad_norm": 0.9882776141166687,
142
+ "learning_rate": 7.16e-05,
143
+ "loss": 0.46865572929382326,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.76,
148
+ "grad_norm": 0.743236780166626,
149
+ "learning_rate": 7.560000000000001e-05,
150
+ "loss": 0.45577139854431153,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.8,
155
+ "grad_norm": 0.6103836894035339,
156
+ "learning_rate": 7.960000000000001e-05,
157
+ "loss": 0.4559042453765869,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.8,
162
+ "eval_loss": 0.485470175743103,
163
+ "eval_runtime": 17.4199,
164
+ "eval_samples_per_second": 22.962,
165
+ "eval_steps_per_second": 3.846,
166
+ "step": 200
167
+ },
168
+ {
169
+ "epoch": 0.84,
170
+ "grad_norm": 0.8245580792427063,
171
+ "learning_rate": 8.36e-05,
172
+ "loss": 0.45926451683044434,
173
+ "step": 210
174
+ },
175
+ {
176
+ "epoch": 0.88,
177
+ "grad_norm": 0.6920369267463684,
178
+ "learning_rate": 8.76e-05,
179
+ "loss": 0.4545453548431396,
180
+ "step": 220
181
+ },
182
+ {
183
+ "epoch": 0.92,
184
+ "grad_norm": 0.6936920881271362,
185
+ "learning_rate": 9.16e-05,
186
+ "loss": 0.47637343406677246,
187
+ "step": 230
188
+ },
189
+ {
190
+ "epoch": 0.96,
191
+ "grad_norm": 0.6694210767745972,
192
+ "learning_rate": 9.56e-05,
193
+ "loss": 0.43120541572570803,
194
+ "step": 240
195
+ },
196
+ {
197
+ "epoch": 1.0,
198
+ "grad_norm": 0.583095133304596,
199
+ "learning_rate": 9.960000000000001e-05,
200
+ "loss": 0.4153712272644043,
201
+ "step": 250
202
+ },
203
+ {
204
+ "epoch": 1.04,
205
+ "grad_norm": 0.6926116943359375,
206
+ "learning_rate": 9.999605221019081e-05,
207
+ "loss": 0.44300012588500975,
208
+ "step": 260
209
+ },
210
+ {
211
+ "epoch": 1.08,
212
+ "grad_norm": 0.761324405670166,
213
+ "learning_rate": 9.998240632972073e-05,
214
+ "loss": 0.462084436416626,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 1.12,
219
+ "grad_norm": 0.5191273093223572,
220
+ "learning_rate": 9.995901628010196e-05,
221
+ "loss": 0.39808471202850343,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 1.16,
226
+ "grad_norm": 0.8463711738586426,
227
+ "learning_rate": 9.9925886621271e-05,
228
+ "loss": 0.423044490814209,
229
+ "step": 290
230
+ },
231
+ {
232
+ "epoch": 1.2,
233
+ "grad_norm": 0.8373249769210815,
234
+ "learning_rate": 9.98830238119205e-05,
235
+ "loss": 0.41622562408447267,
236
+ "step": 300
237
+ },
238
+ {
239
+ "epoch": 1.2,
240
+ "eval_loss": 0.4695434272289276,
241
+ "eval_runtime": 19.2419,
242
+ "eval_samples_per_second": 20.788,
243
+ "eval_steps_per_second": 3.482,
244
+ "step": 300
245
+ },
246
+ {
247
+ "epoch": 1.24,
248
+ "grad_norm": 0.6290304064750671,
249
+ "learning_rate": 9.983043620824005e-05,
250
+ "loss": 0.4166346549987793,
251
+ "step": 310
252
+ },
253
+ {
254
+ "epoch": 1.28,
255
+ "grad_norm": 0.6189863681793213,
256
+ "learning_rate": 9.97681340622872e-05,
257
+ "loss": 0.43734130859375,
258
+ "step": 320
259
+ },
260
+ {
261
+ "epoch": 1.32,
262
+ "grad_norm": 0.5579029321670532,
263
+ "learning_rate": 9.969612951998874e-05,
264
+ "loss": 0.3747305631637573,
265
+ "step": 330
266
+ },
267
+ {
268
+ "epoch": 1.3599999999999999,
269
+ "grad_norm": 1.1675549745559692,
270
+ "learning_rate": 9.961443661877289e-05,
271
+ "loss": 0.42578792572021484,
272
+ "step": 340
273
+ },
274
+ {
275
+ "epoch": 1.4,
276
+ "grad_norm": 0.6578675508499146,
277
+ "learning_rate": 9.952307128483256e-05,
278
+ "loss": 0.39537777900695803,
279
+ "step": 350
280
+ },
281
+ {
282
+ "epoch": 1.44,
283
+ "grad_norm": 0.8092941045761108,
284
+ "learning_rate": 9.942205133002068e-05,
285
+ "loss": 0.4084367275238037,
286
+ "step": 360
287
+ },
288
+ {
289
+ "epoch": 1.48,
290
+ "grad_norm": 0.6226063370704651,
291
+ "learning_rate": 9.931139644837754e-05,
292
+ "loss": 0.3781426906585693,
293
+ "step": 370
294
+ },
295
+ {
296
+ "epoch": 1.52,
297
+ "grad_norm": 0.7148721218109131,
298
+ "learning_rate": 9.919112821229163e-05,
299
+ "loss": 0.3952002048492432,
300
+ "step": 380
301
+ },
302
+ {
303
+ "epoch": 1.56,
304
+ "grad_norm": 0.5743547081947327,
305
+ "learning_rate": 9.906127006829384e-05,
306
+ "loss": 0.4087832927703857,
307
+ "step": 390
308
+ },
309
+ {
310
+ "epoch": 1.6,
311
+ "grad_norm": 0.6315461993217468,
312
+ "learning_rate": 9.892184733248666e-05,
313
+ "loss": 0.3861570119857788,
314
+ "step": 400
315
+ },
316
+ {
317
+ "epoch": 1.6,
318
+ "eval_loss": 0.45406103134155273,
319
+ "eval_runtime": 19.7154,
320
+ "eval_samples_per_second": 20.289,
321
+ "eval_steps_per_second": 3.398,
322
+ "step": 400
323
+ },
324
+ {
325
+ "epoch": 1.6400000000000001,
326
+ "grad_norm": 0.6243694424629211,
327
+ "learning_rate": 9.877288718560866e-05,
328
+ "loss": 0.39033331871032717,
329
+ "step": 410
330
+ },
331
+ {
332
+ "epoch": 1.6800000000000002,
333
+ "grad_norm": 0.6677294969558716,
334
+ "learning_rate": 9.861441866773564e-05,
335
+ "loss": 0.43663845062255857,
336
+ "step": 420
337
+ },
338
+ {
339
+ "epoch": 1.72,
340
+ "grad_norm": 0.6460554599761963,
341
+ "learning_rate": 9.844647267261916e-05,
342
+ "loss": 0.43364706039428713,
343
+ "step": 430
344
+ },
345
+ {
346
+ "epoch": 1.76,
347
+ "grad_norm": 0.570160984992981,
348
+ "learning_rate": 9.82690819416637e-05,
349
+ "loss": 0.409498929977417,
350
+ "step": 440
351
+ },
352
+ {
353
+ "epoch": 1.8,
354
+ "grad_norm": 0.5696760416030884,
355
+ "learning_rate": 9.808228105754376e-05,
356
+ "loss": 0.4264820098876953,
357
+ "step": 450
358
+ },
359
+ {
360
+ "epoch": 1.8399999999999999,
361
+ "grad_norm": 0.583260715007782,
362
+ "learning_rate": 9.788610643746184e-05,
363
+ "loss": 0.417040491104126,
364
+ "step": 460
365
+ },
366
+ {
367
+ "epoch": 1.88,
368
+ "grad_norm": 0.6025984287261963,
369
+ "learning_rate": 9.76805963260488e-05,
370
+ "loss": 0.3749807357788086,
371
+ "step": 470
372
+ },
373
+ {
374
+ "epoch": 1.92,
375
+ "grad_norm": 0.5953373312950134,
376
+ "learning_rate": 9.746579078790807e-05,
377
+ "loss": 0.4022481918334961,
378
+ "step": 480
379
+ },
380
+ {
381
+ "epoch": 1.96,
382
+ "grad_norm": 0.4357820153236389,
383
+ "learning_rate": 9.724173169980491e-05,
384
+ "loss": 0.38319835662841795,
385
+ "step": 490
386
+ },
387
+ {
388
+ "epoch": 2.0,
389
+ "grad_norm": 0.5152677297592163,
390
+ "learning_rate": 9.700846274250251e-05,
391
+ "loss": 0.4122174263000488,
392
+ "step": 500
393
+ },
394
+ {
395
+ "epoch": 2.0,
396
+ "eval_loss": 0.44415727257728577,
397
+ "eval_runtime": 18.9015,
398
+ "eval_samples_per_second": 21.162,
399
+ "eval_steps_per_second": 3.545,
400
+ "step": 500
401
+ },
402
+ {
403
+ "epoch": 2.04,
404
+ "grad_norm": 0.38848409056663513,
405
+ "learning_rate": 9.676602939224629e-05,
406
+ "loss": 0.3524669408798218,
407
+ "step": 510
408
+ },
409
+ {
410
+ "epoch": 2.08,
411
+ "grad_norm": 0.5285012125968933,
412
+ "learning_rate": 9.651447891189825e-05,
413
+ "loss": 0.3717231273651123,
414
+ "step": 520
415
+ },
416
+ {
417
+ "epoch": 2.12,
418
+ "grad_norm": 0.6452465653419495,
419
+ "learning_rate": 9.62538603417229e-05,
420
+ "loss": 0.40065832138061525,
421
+ "step": 530
422
+ },
423
+ {
424
+ "epoch": 2.16,
425
+ "grad_norm": 0.48196467757225037,
426
+ "learning_rate": 9.598422448982696e-05,
427
+ "loss": 0.33635973930358887,
428
+ "step": 540
429
+ },
430
+ {
431
+ "epoch": 2.2,
432
+ "grad_norm": 0.563376247882843,
433
+ "learning_rate": 9.570562392225396e-05,
434
+ "loss": 0.3708656787872314,
435
+ "step": 550
436
+ },
437
+ {
438
+ "epoch": 2.24,
439
+ "grad_norm": 0.6459429860115051,
440
+ "learning_rate": 9.541811295273656e-05,
441
+ "loss": 0.35284056663513186,
442
+ "step": 560
443
+ },
444
+ {
445
+ "epoch": 2.2800000000000002,
446
+ "grad_norm": 0.5247339606285095,
447
+ "learning_rate": 9.512174763210797e-05,
448
+ "loss": 0.3429510831832886,
449
+ "step": 570
450
+ },
451
+ {
452
+ "epoch": 2.32,
453
+ "grad_norm": 0.5456256866455078,
454
+ "learning_rate": 9.481658573737465e-05,
455
+ "loss": 0.36770102977752683,
456
+ "step": 580
457
+ },
458
+ {
459
+ "epoch": 2.36,
460
+ "grad_norm": 0.5435087084770203,
461
+ "learning_rate": 9.450268676045262e-05,
462
+ "loss": 0.3684037208557129,
463
+ "step": 590
464
+ },
465
+ {
466
+ "epoch": 2.4,
467
+ "grad_norm": 0.5584478974342346,
468
+ "learning_rate": 9.418011189656941e-05,
469
+ "loss": 0.3221792697906494,
470
+ "step": 600
471
+ },
472
+ {
473
+ "epoch": 2.4,
474
+ "eval_loss": 0.44748273491859436,
475
+ "eval_runtime": 18.8521,
476
+ "eval_samples_per_second": 21.218,
477
+ "eval_steps_per_second": 3.554,
478
+ "step": 600
479
+ },
480
+ {
481
+ "epoch": 2.44,
482
+ "grad_norm": 0.7217129468917847,
483
+ "learning_rate": 9.384892403233384e-05,
484
+ "loss": 0.40174164772033694,
485
+ "step": 610
486
+ },
487
+ {
488
+ "epoch": 2.48,
489
+ "grad_norm": 0.5068971514701843,
490
+ "learning_rate": 9.35091877334763e-05,
491
+ "loss": 0.3701002836227417,
492
+ "step": 620
493
+ },
494
+ {
495
+ "epoch": 2.52,
496
+ "grad_norm": 0.4331487715244293,
497
+ "learning_rate": 9.316096923226135e-05,
498
+ "loss": 0.3759175777435303,
499
+ "step": 630
500
+ },
501
+ {
502
+ "epoch": 2.56,
503
+ "grad_norm": 0.5161293148994446,
504
+ "learning_rate": 9.28043364145758e-05,
505
+ "loss": 0.3581662178039551,
506
+ "step": 640
507
+ },
508
+ {
509
+ "epoch": 2.6,
510
+ "grad_norm": 0.709299623966217,
511
+ "learning_rate": 9.24393588066941e-05,
512
+ "loss": 0.35065665245056155,
513
+ "step": 650
514
+ },
515
+ {
516
+ "epoch": 2.64,
517
+ "grad_norm": 0.6004891991615295,
518
+ "learning_rate": 9.206610756172402e-05,
519
+ "loss": 0.36879355907440187,
520
+ "step": 660
521
+ },
522
+ {
523
+ "epoch": 2.68,
524
+ "grad_norm": 0.4662474989891052,
525
+ "learning_rate": 9.168465544573536e-05,
526
+ "loss": 0.3592060565948486,
527
+ "step": 670
528
+ },
529
+ {
530
+ "epoch": 2.7199999999999998,
531
+ "grad_norm": 0.5826489329338074,
532
+ "learning_rate": 9.129507682357394e-05,
533
+ "loss": 0.36156315803527833,
534
+ "step": 680
535
+ },
536
+ {
537
+ "epoch": 2.76,
538
+ "grad_norm": 0.48988744616508484,
539
+ "learning_rate": 9.089744764436403e-05,
540
+ "loss": 0.34445748329162595,
541
+ "step": 690
542
+ },
543
+ {
544
+ "epoch": 2.8,
545
+ "grad_norm": 0.4443361163139343,
546
+ "learning_rate": 9.049184542670199e-05,
547
+ "loss": 0.3526463985443115,
548
+ "step": 700
549
+ },
550
+ {
551
+ "epoch": 2.8,
552
+ "eval_loss": 0.44259119033813477,
553
+ "eval_runtime": 16.8228,
554
+ "eval_samples_per_second": 23.777,
555
+ "eval_steps_per_second": 3.983,
556
+ "step": 700
557
+ },
558
+ {
559
+ "epoch": 2.84,
560
+ "grad_norm": 0.5471161007881165,
561
+ "learning_rate": 9.007834924354383e-05,
562
+ "loss": 0.3458081245422363,
563
+ "step": 710
564
+ },
565
+ {
566
+ "epoch": 2.88,
567
+ "grad_norm": 0.5264748930931091,
568
+ "learning_rate": 8.965703970678974e-05,
569
+ "loss": 0.3651163101196289,
570
+ "step": 720
571
+ },
572
+ {
573
+ "epoch": 2.92,
574
+ "grad_norm": 0.48987507820129395,
575
+ "learning_rate": 8.922799895156867e-05,
576
+ "loss": 0.3218229293823242,
577
+ "step": 730
578
+ },
579
+ {
580
+ "epoch": 2.96,
581
+ "grad_norm": 0.5640589594841003,
582
+ "learning_rate": 8.879131062022598e-05,
583
+ "loss": 0.3561582088470459,
584
+ "step": 740
585
+ },
586
+ {
587
+ "epoch": 3.0,
588
+ "grad_norm": 0.7934619784355164,
589
+ "learning_rate": 8.834705984601708e-05,
590
+ "loss": 0.36128854751586914,
591
+ "step": 750
592
+ },
593
+ {
594
+ "epoch": 3.04,
595
+ "grad_norm": 1.0869489908218384,
596
+ "learning_rate": 8.789533323651066e-05,
597
+ "loss": 0.31422438621521,
598
+ "step": 760
599
+ },
600
+ {
601
+ "epoch": 3.08,
602
+ "grad_norm": 0.4695897102355957,
603
+ "learning_rate": 8.74362188567043e-05,
604
+ "loss": 0.29355826377868655,
605
+ "step": 770
606
+ },
607
+ {
608
+ "epoch": 3.12,
609
+ "grad_norm": 0.5532680153846741,
610
+ "learning_rate": 8.696980621185602e-05,
611
+ "loss": 0.3185117721557617,
612
+ "step": 780
613
+ },
614
+ {
615
+ "epoch": 3.16,
616
+ "grad_norm": 0.5760806202888489,
617
+ "learning_rate": 8.649618623003508e-05,
618
+ "loss": 0.28971233367919924,
619
+ "step": 790
620
+ },
621
+ {
622
+ "epoch": 3.2,
623
+ "grad_norm": 0.5517900586128235,
624
+ "learning_rate": 8.601545124439535e-05,
625
+ "loss": 0.3055370092391968,
626
+ "step": 800
627
+ },
628
+ {
629
+ "epoch": 3.2,
630
+ "eval_loss": 0.4529191255569458,
631
+ "eval_runtime": 18.5382,
632
+ "eval_samples_per_second": 21.577,
633
+ "eval_steps_per_second": 3.614,
634
+ "step": 800
635
+ },
636
+ {
637
+ "epoch": 3.24,
638
+ "grad_norm": 0.5356678366661072,
639
+ "learning_rate": 8.552769497517482e-05,
640
+ "loss": 0.28035550117492675,
641
+ "step": 810
642
+ },
643
+ {
644
+ "epoch": 3.2800000000000002,
645
+ "grad_norm": 0.5985352993011475,
646
+ "learning_rate": 8.503301251142459e-05,
647
+ "loss": 0.3199602603912354,
648
+ "step": 820
649
+ },
650
+ {
651
+ "epoch": 3.32,
652
+ "grad_norm": 0.5187913179397583,
653
+ "learning_rate": 8.453150029247114e-05,
654
+ "loss": 0.29444499015808107,
655
+ "step": 830
656
+ },
657
+ {
658
+ "epoch": 3.36,
659
+ "grad_norm": 0.5703292489051819,
660
+ "learning_rate": 8.402325608911526e-05,
661
+ "loss": 0.30467259883880615,
662
+ "step": 840
663
+ },
664
+ {
665
+ "epoch": 3.4,
666
+ "grad_norm": 0.9323157072067261,
667
+ "learning_rate": 8.350837898457143e-05,
668
+ "loss": 0.3117033004760742,
669
+ "step": 850
670
+ },
671
+ {
672
+ "epoch": 3.44,
673
+ "grad_norm": 0.628546953201294,
674
+ "learning_rate": 8.298696935515132e-05,
675
+ "loss": 0.34261503219604494,
676
+ "step": 860
677
+ },
678
+ {
679
+ "epoch": 3.48,
680
+ "grad_norm": 0.5379561185836792,
681
+ "learning_rate": 8.245912885069531e-05,
682
+ "loss": 0.3159458637237549,
683
+ "step": 870
684
+ },
685
+ {
686
+ "epoch": 3.52,
687
+ "grad_norm": 0.6575730443000793,
688
+ "learning_rate": 8.192496037475562e-05,
689
+ "loss": 0.2982481002807617,
690
+ "step": 880
691
+ },
692
+ {
693
+ "epoch": 3.56,
694
+ "grad_norm": 0.5830497145652771,
695
+ "learning_rate": 8.138456806453503e-05,
696
+ "loss": 0.3232215404510498,
697
+ "step": 890
698
+ },
699
+ {
700
+ "epoch": 3.6,
701
+ "grad_norm": 0.5474710464477539,
702
+ "learning_rate": 8.083805727058513e-05,
703
+ "loss": 0.3305091381072998,
704
+ "step": 900
705
+ },
706
+ {
707
+ "epoch": 3.6,
708
+ "eval_loss": 0.44760578870773315,
709
+ "eval_runtime": 19.5159,
710
+ "eval_samples_per_second": 20.496,
711
+ "eval_steps_per_second": 3.433,
712
+ "step": 900
713
+ },
714
+ {
715
+ "epoch": 3.64,
716
+ "grad_norm": 0.5096336007118225,
717
+ "learning_rate": 8.028553453626808e-05,
718
+ "loss": 0.35752732753753663,
719
+ "step": 910
720
+ },
721
+ {
722
+ "epoch": 3.68,
723
+ "grad_norm": 0.5023341774940491,
724
+ "learning_rate": 7.972710757698567e-05,
725
+ "loss": 0.3292932271957397,
726
+ "step": 920
727
+ },
728
+ {
729
+ "epoch": 3.7199999999999998,
730
+ "grad_norm": 0.5277951955795288,
731
+ "learning_rate": 7.916288525918007e-05,
732
+ "loss": 0.28986682891845705,
733
+ "step": 930
734
+ },
735
+ {
736
+ "epoch": 3.76,
737
+ "grad_norm": 0.600412905216217,
738
+ "learning_rate": 7.859297757911013e-05,
739
+ "loss": 0.3027395725250244,
740
+ "step": 940
741
+ },
742
+ {
743
+ "epoch": 3.8,
744
+ "grad_norm": 0.6396210193634033,
745
+ "learning_rate": 7.801749564140724e-05,
746
+ "loss": 0.3238774061203003,
747
+ "step": 950
748
+ },
749
+ {
750
+ "epoch": 3.84,
751
+ "grad_norm": 0.628635585308075,
752
+ "learning_rate": 7.743655163741543e-05,
753
+ "loss": 0.34537086486816404,
754
+ "step": 960
755
+ },
756
+ {
757
+ "epoch": 3.88,
758
+ "grad_norm": 0.49822649359703064,
759
+ "learning_rate": 7.685025882331936e-05,
760
+ "loss": 0.3292637825012207,
761
+ "step": 970
762
+ },
763
+ {
764
+ "epoch": 3.92,
765
+ "grad_norm": 0.5356727242469788,
766
+ "learning_rate": 7.62587314980648e-05,
767
+ "loss": 0.32722015380859376,
768
+ "step": 980
769
+ },
770
+ {
771
+ "epoch": 3.96,
772
+ "grad_norm": 0.6211317777633667,
773
+ "learning_rate": 7.566208498107585e-05,
774
+ "loss": 0.29880056381225584,
775
+ "step": 990
776
+ },
777
+ {
778
+ "epoch": 4.0,
779
+ "grad_norm": 0.5336779356002808,
780
+ "learning_rate": 7.506043558977321e-05,
781
+ "loss": 0.2978524684906006,
782
+ "step": 1000
783
+ },
784
+ {
785
+ "epoch": 4.0,
786
+ "eval_loss": 0.44613513350486755,
787
+ "eval_runtime": 19.2382,
788
+ "eval_samples_per_second": 20.792,
789
+ "eval_steps_per_second": 3.483,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 4.04,
794
+ "grad_norm": 0.6681120991706848,
795
+ "learning_rate": 7.445390061689782e-05,
796
+ "loss": 0.27530927658081056,
797
+ "step": 1010
798
+ },
799
+ {
800
+ "epoch": 4.08,
801
+ "grad_norm": 0.6299528479576111,
802
+ "learning_rate": 7.38425983076444e-05,
803
+ "loss": 0.2517704486846924,
804
+ "step": 1020
805
+ },
806
+ {
807
+ "epoch": 4.12,
808
+ "grad_norm": 0.5211061239242554,
809
+ "learning_rate": 7.32266478366094e-05,
810
+ "loss": 0.28200175762176516,
811
+ "step": 1030
812
+ },
813
+ {
814
+ "epoch": 4.16,
815
+ "grad_norm": 0.5778363347053528,
816
+ "learning_rate": 7.260616928455754e-05,
817
+ "loss": 0.2569046258926392,
818
+ "step": 1040
819
+ },
820
+ {
821
+ "epoch": 4.2,
822
+ "grad_norm": 0.6715266108512878,
823
+ "learning_rate": 7.1981283615012e-05,
824
+ "loss": 0.2665576696395874,
825
+ "step": 1050
826
+ },
827
+ {
828
+ "epoch": 4.24,
829
+ "grad_norm": 0.6580007672309875,
830
+ "learning_rate": 7.135211265067216e-05,
831
+ "loss": 0.2635650634765625,
832
+ "step": 1060
833
+ },
834
+ {
835
+ "epoch": 4.28,
836
+ "grad_norm": 0.6889304518699646,
837
+ "learning_rate": 7.071877904966423e-05,
838
+ "loss": 0.26842334270477297,
839
+ "step": 1070
840
+ },
841
+ {
842
+ "epoch": 4.32,
843
+ "grad_norm": 0.5896309018135071,
844
+ "learning_rate": 7.00814062816285e-05,
845
+ "loss": 0.2633937358856201,
846
+ "step": 1080
847
+ },
848
+ {
849
+ "epoch": 4.36,
850
+ "grad_norm": 0.6062363386154175,
851
+ "learning_rate": 6.944011860364905e-05,
852
+ "loss": 0.2895397186279297,
853
+ "step": 1090
854
+ },
855
+ {
856
+ "epoch": 4.4,
857
+ "grad_norm": 0.6124110817909241,
858
+ "learning_rate": 6.879504103602935e-05,
859
+ "loss": 0.27405414581298826,
860
+ "step": 1100
861
+ },
862
+ {
863
+ "epoch": 4.4,
864
+ "eval_loss": 0.46795058250427246,
865
+ "eval_runtime": 17.2143,
866
+ "eval_samples_per_second": 23.237,
867
+ "eval_steps_per_second": 3.892,
868
+ "step": 1100
869
+ },
870
+ {
871
+ "epoch": 4.44,
872
+ "grad_norm": 0.8100364208221436,
873
+ "learning_rate": 6.814629933791931e-05,
874
+ "loss": 0.2581511974334717,
875
+ "step": 1110
876
+ },
877
+ {
878
+ "epoch": 4.48,
879
+ "grad_norm": 0.6187950372695923,
880
+ "learning_rate": 6.749401998279846e-05,
881
+ "loss": 0.2689012050628662,
882
+ "step": 1120
883
+ },
884
+ {
885
+ "epoch": 4.52,
886
+ "grad_norm": 0.6595885157585144,
887
+ "learning_rate": 6.683833013381941e-05,
888
+ "loss": 0.27230424880981446,
889
+ "step": 1130
890
+ },
891
+ {
892
+ "epoch": 4.5600000000000005,
893
+ "grad_norm": 0.6320788860321045,
894
+ "learning_rate": 6.617935761901748e-05,
895
+ "loss": 0.2903036594390869,
896
+ "step": 1140
897
+ },
898
+ {
899
+ "epoch": 4.6,
900
+ "grad_norm": 0.6367589831352234,
901
+ "learning_rate": 6.551723090639007e-05,
902
+ "loss": 0.2551115989685059,
903
+ "step": 1150
904
+ },
905
+ {
906
+ "epoch": 4.64,
907
+ "grad_norm": 0.5754795670509338,
908
+ "learning_rate": 6.485207907885175e-05,
909
+ "loss": 0.2783109188079834,
910
+ "step": 1160
911
+ },
912
+ {
913
+ "epoch": 4.68,
914
+ "grad_norm": 0.6343188881874084,
915
+ "learning_rate": 6.418403180906922e-05,
916
+ "loss": 0.29131503105163575,
917
+ "step": 1170
918
+ },
919
+ {
920
+ "epoch": 4.72,
921
+ "grad_norm": 0.6726956963539124,
922
+ "learning_rate": 6.351321933418139e-05,
923
+ "loss": 0.2730400085449219,
924
+ "step": 1180
925
+ },
926
+ {
927
+ "epoch": 4.76,
928
+ "grad_norm": 0.5498913526535034,
929
+ "learning_rate": 6.283977243040939e-05,
930
+ "loss": 0.2572148323059082,
931
+ "step": 1190
932
+ },
933
+ {
934
+ "epoch": 4.8,
935
+ "grad_norm": 0.6083167195320129,
936
+ "learning_rate": 6.216382238756146e-05,
937
+ "loss": 0.27444655895233155,
938
+ "step": 1200
939
+ },
940
+ {
941
+ "epoch": 4.8,
942
+ "eval_loss": 0.466619610786438,
943
+ "eval_runtime": 19.9505,
944
+ "eval_samples_per_second": 20.05,
945
+ "eval_steps_per_second": 3.358,
946
+ "step": 1200
947
+ },
948
+ {
949
+ "epoch": 4.84,
950
+ "grad_norm": 0.5861450433731079,
951
+ "learning_rate": 6.148550098343778e-05,
952
+ "loss": 0.27054529190063475,
953
+ "step": 1210
954
+ },
955
+ {
956
+ "epoch": 4.88,
957
+ "grad_norm": 0.7090939879417419,
958
+ "learning_rate": 6.080494045814011e-05,
959
+ "loss": 0.26785056591033934,
960
+ "step": 1220
961
+ },
962
+ {
963
+ "epoch": 4.92,
964
+ "grad_norm": 0.5825073719024658,
965
+ "learning_rate": 6.0122273488291304e-05,
966
+ "loss": 0.26335647106170657,
967
+ "step": 1230
968
+ },
969
+ {
970
+ "epoch": 4.96,
971
+ "grad_norm": 0.5506169199943542,
972
+ "learning_rate": 5.943763316116977e-05,
973
+ "loss": 0.2614041090011597,
974
+ "step": 1240
975
+ },
976
+ {
977
+ "epoch": 5.0,
978
+ "grad_norm": 0.6169804930686951,
979
+ "learning_rate": 5.875115294876381e-05,
980
+ "loss": 0.24768717288970948,
981
+ "step": 1250
982
+ },
983
+ {
984
+ "epoch": 5.04,
985
+ "grad_norm": 0.8200834393501282,
986
+ "learning_rate": 5.806296668175104e-05,
987
+ "loss": 0.21707432270050048,
988
+ "step": 1260
989
+ },
990
+ {
991
+ "epoch": 5.08,
992
+ "grad_norm": 1.5680038928985596,
993
+ "learning_rate": 5.737320852340775e-05,
994
+ "loss": 0.2139519214630127,
995
+ "step": 1270
996
+ },
997
+ {
998
+ "epoch": 5.12,
999
+ "grad_norm": 0.6845637559890747,
1000
+ "learning_rate": 5.668201294345363e-05,
1001
+ "loss": 0.20998594760894776,
1002
+ "step": 1280
1003
+ },
1004
+ {
1005
+ "epoch": 5.16,
1006
+ "grad_norm": 0.8293268084526062,
1007
+ "learning_rate": 5.598951469183649e-05,
1008
+ "loss": 0.23306002616882324,
1009
+ "step": 1290
1010
+ },
1011
+ {
1012
+ "epoch": 5.2,
1013
+ "grad_norm": 0.7228839993476868,
1014
+ "learning_rate": 5.52958487724626e-05,
1015
+ "loss": 0.2262401580810547,
1016
+ "step": 1300
1017
+ },
1018
+ {
1019
+ "epoch": 5.2,
1020
+ "eval_loss": 0.49972543120384216,
1021
+ "eval_runtime": 18.926,
1022
+ "eval_samples_per_second": 21.135,
1023
+ "eval_steps_per_second": 3.54,
1024
+ "step": 1300
1025
+ },
1026
+ {
1027
+ "epoch": 5.24,
1028
+ "grad_norm": 0.6243706345558167,
1029
+ "learning_rate": 5.4601150416877367e-05,
1030
+ "loss": 0.21100988388061523,
1031
+ "step": 1310
1032
+ },
1033
+ {
1034
+ "epoch": 5.28,
1035
+ "grad_norm": 1.0553343296051025,
1036
+ "learning_rate": 5.390555505790168e-05,
1037
+ "loss": 0.23542592525482178,
1038
+ "step": 1320
1039
+ },
1040
+ {
1041
+ "epoch": 5.32,
1042
+ "grad_norm": 0.6127402186393738,
1043
+ "learning_rate": 5.3209198303229027e-05,
1044
+ "loss": 0.2095633029937744,
1045
+ "step": 1330
1046
+ },
1047
+ {
1048
+ "epoch": 5.36,
1049
+ "grad_norm": 0.7463288903236389,
1050
+ "learning_rate": 5.2512215908988484e-05,
1051
+ "loss": 0.21693904399871827,
1052
+ "step": 1340
1053
+ },
1054
+ {
1055
+ "epoch": 5.4,
1056
+ "grad_norm": 0.8020226955413818,
1057
+ "learning_rate": 5.1814743753278795e-05,
1058
+ "loss": 0.2076347827911377,
1059
+ "step": 1350
1060
+ },
1061
+ {
1062
+ "epoch": 5.44,
1063
+ "grad_norm": 0.6652446389198303,
1064
+ "learning_rate": 5.111691780967869e-05,
1065
+ "loss": 0.22539749145507812,
1066
+ "step": 1360
1067
+ },
1068
+ {
1069
+ "epoch": 5.48,
1070
+ "grad_norm": 0.6378898620605469,
1071
+ "learning_rate": 5.041887412073854e-05,
1072
+ "loss": 0.2077547550201416,
1073
+ "step": 1370
1074
+ },
1075
+ {
1076
+ "epoch": 5.52,
1077
+ "grad_norm": 0.7381134033203125,
1078
+ "learning_rate": 4.97207487714586e-05,
1079
+ "loss": 0.21558783054351807,
1080
+ "step": 1380
1081
+ },
1082
+ {
1083
+ "epoch": 5.5600000000000005,
1084
+ "grad_norm": 0.6613102555274963,
1085
+ "learning_rate": 4.9022677862758945e-05,
1086
+ "loss": 0.21069679260253907,
1087
+ "step": 1390
1088
+ },
1089
+ {
1090
+ "epoch": 5.6,
1091
+ "grad_norm": 0.7527480721473694,
1092
+ "learning_rate": 4.832479748494643e-05,
1093
+ "loss": 0.21843309402465821,
1094
+ "step": 1400
1095
+ },
1096
+ {
1097
+ "epoch": 5.6,
1098
+ "eval_loss": 0.49576279520988464,
1099
+ "eval_runtime": 18.3368,
1100
+ "eval_samples_per_second": 21.814,
1101
+ "eval_steps_per_second": 3.654,
1102
+ "step": 1400
1103
+ },
1104
+ {
1105
+ "epoch": 5.64,
1106
+ "grad_norm": 0.5983570218086243,
1107
+ "learning_rate": 4.7627243691183453e-05,
1108
+ "loss": 0.22310276031494142,
1109
+ "step": 1410
1110
+ },
1111
+ {
1112
+ "epoch": 5.68,
1113
+ "grad_norm": 0.6202098727226257,
1114
+ "learning_rate": 4.693015247096423e-05,
1115
+ "loss": 0.22056117057800292,
1116
+ "step": 1420
1117
+ },
1118
+ {
1119
+ "epoch": 5.72,
1120
+ "grad_norm": 0.7730934023857117,
1121
+ "learning_rate": 4.623365972360337e-05,
1122
+ "loss": 0.2241537094116211,
1123
+ "step": 1430
1124
+ },
1125
+ {
1126
+ "epoch": 5.76,
1127
+ "grad_norm": 0.6262892484664917,
1128
+ "learning_rate": 4.553790123174197e-05,
1129
+ "loss": 0.21514451503753662,
1130
+ "step": 1440
1131
+ },
1132
+ {
1133
+ "epoch": 5.8,
1134
+ "grad_norm": 0.646507203578949,
1135
+ "learning_rate": 4.484301263487665e-05,
1136
+ "loss": 0.21031346321105956,
1137
+ "step": 1450
1138
+ },
1139
+ {
1140
+ "epoch": 5.84,
1141
+ "grad_norm": 0.8227706551551819,
1142
+ "learning_rate": 4.414912940291613e-05,
1143
+ "loss": 0.2312474489212036,
1144
+ "step": 1460
1145
+ },
1146
+ {
1147
+ "epoch": 5.88,
1148
+ "grad_norm": 0.6932390332221985,
1149
+ "learning_rate": 4.345638680977139e-05,
1150
+ "loss": 0.22380952835083007,
1151
+ "step": 1470
1152
+ },
1153
+ {
1154
+ "epoch": 5.92,
1155
+ "grad_norm": 0.7352316379547119,
1156
+ "learning_rate": 4.276491990698355e-05,
1157
+ "loss": 0.22706894874572753,
1158
+ "step": 1480
1159
+ },
1160
+ {
1161
+ "epoch": 5.96,
1162
+ "grad_norm": 0.6953718066215515,
1163
+ "learning_rate": 4.2074863497395377e-05,
1164
+ "loss": 0.2103546142578125,
1165
+ "step": 1490
1166
+ },
1167
+ {
1168
+ "epoch": 6.0,
1169
+ "grad_norm": 0.661618709564209,
1170
+ "learning_rate": 4.1386352108871174e-05,
1171
+ "loss": 0.2276217222213745,
1172
+ "step": 1500
1173
+ },
1174
+ {
1175
+ "epoch": 6.0,
1176
+ "eval_loss": 0.4966464042663574,
1177
+ "eval_runtime": 17.2948,
1178
+ "eval_samples_per_second": 23.128,
1179
+ "eval_steps_per_second": 3.874,
1180
+ "step": 1500
1181
+ },
1182
+ {
1183
+ "epoch": 6.04,
1184
+ "grad_norm": 0.8837434649467468,
1185
+ "learning_rate": 4.069951996807034e-05,
1186
+ "loss": 0.16540236473083497,
1187
+ "step": 1510
1188
+ },
1189
+ {
1190
+ "epoch": 6.08,
1191
+ "grad_norm": 1.3857215642929077,
1192
+ "learning_rate": 4.001450097427966e-05,
1193
+ "loss": 0.1638352394104004,
1194
+ "step": 1520
1195
+ },
1196
+ {
1197
+ "epoch": 6.12,
1198
+ "grad_norm": 0.8306711912155151,
1199
+ "learning_rate": 3.9331428673309204e-05,
1200
+ "loss": 0.1719011664390564,
1201
+ "step": 1530
1202
+ },
1203
+ {
1204
+ "epoch": 6.16,
1205
+ "grad_norm": 0.8509021997451782,
1206
+ "learning_rate": 3.865043623145751e-05,
1207
+ "loss": 0.1651092290878296,
1208
+ "step": 1540
1209
+ },
1210
+ {
1211
+ "epoch": 6.2,
1212
+ "grad_norm": 0.7507994174957275,
1213
+ "learning_rate": 3.797165640955041e-05,
1214
+ "loss": 0.1746900796890259,
1215
+ "step": 1550
1216
+ },
1217
+ {
1218
+ "epoch": 6.24,
1219
+ "grad_norm": 0.740626335144043,
1220
+ "learning_rate": 3.729522153705916e-05,
1221
+ "loss": 0.16637682914733887,
1222
+ "step": 1560
1223
+ },
1224
+ {
1225
+ "epoch": 6.28,
1226
+ "grad_norm": 0.6479809880256653,
1227
+ "learning_rate": 3.662126348630237e-05,
1228
+ "loss": 0.1709848165512085,
1229
+ "step": 1570
1230
+ },
1231
+ {
1232
+ "epoch": 6.32,
1233
+ "grad_norm": 0.6932395100593567,
1234
+ "learning_rate": 3.594991364673745e-05,
1235
+ "loss": 0.18107957839965821,
1236
+ "step": 1580
1237
+ },
1238
+ {
1239
+ "epoch": 6.36,
1240
+ "grad_norm": 0.8027141690254211,
1241
+ "learning_rate": 3.528130289934583e-05,
1242
+ "loss": 0.16225044727325438,
1243
+ "step": 1590
1244
+ },
1245
+ {
1246
+ "epoch": 6.4,
1247
+ "grad_norm": 0.5781376957893372,
1248
+ "learning_rate": 3.461556159111748e-05,
1249
+ "loss": 0.17544152736663818,
1250
+ "step": 1600
1251
+ },
1252
+ {
1253
+ "epoch": 6.4,
1254
+ "eval_loss": 0.5342507362365723,
1255
+ "eval_runtime": 19.471,
1256
+ "eval_samples_per_second": 20.543,
1257
+ "eval_steps_per_second": 3.441,
1258
+ "step": 1600
1259
+ },
1260
+ {
1261
+ "epoch": 6.44,
1262
+ "grad_norm": 0.7642867565155029,
1263
+ "learning_rate": 3.3952819509639534e-05,
1264
+ "loss": 0.17091144323349,
1265
+ "step": 1610
1266
+ },
1267
+ {
1268
+ "epoch": 6.48,
1269
+ "grad_norm": 0.7651257514953613,
1270
+ "learning_rate": 3.329320585779393e-05,
1271
+ "loss": 0.17765278816223146,
1272
+ "step": 1620
1273
+ },
1274
+ {
1275
+ "epoch": 6.52,
1276
+ "grad_norm": 0.6956056356430054,
1277
+ "learning_rate": 3.263684922856905e-05,
1278
+ "loss": 0.16475566625595092,
1279
+ "step": 1630
1280
+ },
1281
+ {
1282
+ "epoch": 6.5600000000000005,
1283
+ "grad_norm": 0.7344402074813843,
1284
+ "learning_rate": 3.1983877579990274e-05,
1285
+ "loss": 0.172060227394104,
1286
+ "step": 1640
1287
+ },
1288
+ {
1289
+ "epoch": 6.6,
1290
+ "grad_norm": 0.7196578979492188,
1291
+ "learning_rate": 3.1334418210174263e-05,
1292
+ "loss": 0.16673840284347535,
1293
+ "step": 1650
1294
+ },
1295
+ {
1296
+ "epoch": 6.64,
1297
+ "grad_norm": 0.7540257573127747,
1298
+ "learning_rate": 3.0688597732512e-05,
1299
+ "loss": 0.17414634227752684,
1300
+ "step": 1660
1301
+ },
1302
+ {
1303
+ "epoch": 6.68,
1304
+ "grad_norm": 0.5103999972343445,
1305
+ "learning_rate": 3.0046542050985237e-05,
1306
+ "loss": 0.1620783567428589,
1307
+ "step": 1670
1308
+ },
1309
+ {
1310
+ "epoch": 6.72,
1311
+ "grad_norm": 0.8846920132637024,
1312
+ "learning_rate": 2.940837633562127e-05,
1313
+ "loss": 0.17428462505340575,
1314
+ "step": 1680
1315
+ },
1316
+ {
1317
+ "epoch": 6.76,
1318
+ "grad_norm": 0.8017328381538391,
1319
+ "learning_rate": 2.877422499809072e-05,
1320
+ "loss": 0.19050977230072022,
1321
+ "step": 1690
1322
+ },
1323
+ {
1324
+ "epoch": 6.8,
1325
+ "grad_norm": 0.8515416383743286,
1326
+ "learning_rate": 2.8144211667453368e-05,
1327
+ "loss": 0.16926174163818358,
1328
+ "step": 1700
1329
+ },
1330
+ {
1331
+ "epoch": 6.8,
1332
+ "eval_loss": 0.5441356301307678,
1333
+ "eval_runtime": 17.5836,
1334
+ "eval_samples_per_second": 22.749,
1335
+ "eval_steps_per_second": 3.81,
1336
+ "step": 1700
1337
+ },
1338
+ {
1339
+ "epoch": 6.84,
1340
+ "grad_norm": 0.7547643184661865,
1341
+ "learning_rate": 2.75184591660563e-05,
1342
+ "loss": 0.1793771743774414,
1343
+ "step": 1710
1344
+ },
1345
+ {
1346
+ "epoch": 6.88,
1347
+ "grad_norm": 0.7164461016654968,
1348
+ "learning_rate": 2.6897089485589583e-05,
1349
+ "loss": 0.1647491931915283,
1350
+ "step": 1720
1351
+ },
1352
+ {
1353
+ "epoch": 6.92,
1354
+ "grad_norm": 1.1592035293579102,
1355
+ "learning_rate": 2.6280223763303546e-05,
1356
+ "loss": 0.17397019863128663,
1357
+ "step": 1730
1358
+ },
1359
+ {
1360
+ "epoch": 6.96,
1361
+ "grad_norm": 0.9889470934867859,
1362
+ "learning_rate": 2.5667982258393014e-05,
1363
+ "loss": 0.17107686996459961,
1364
+ "step": 1740
1365
+ },
1366
+ {
1367
+ "epoch": 7.0,
1368
+ "grad_norm": 0.7448652982711792,
1369
+ "learning_rate": 2.506048432855247e-05,
1370
+ "loss": 0.1730511426925659,
1371
+ "step": 1750
1372
+ },
1373
+ {
1374
+ "epoch": 7.04,
1375
+ "grad_norm": 0.6695497632026672,
1376
+ "learning_rate": 2.4457848406707013e-05,
1377
+ "loss": 0.13950222730636597,
1378
+ "step": 1760
1379
+ },
1380
+ {
1381
+ "epoch": 7.08,
1382
+ "grad_norm": 0.7200675010681152,
1383
+ "learning_rate": 2.3860191977923672e-05,
1384
+ "loss": 0.1326605796813965,
1385
+ "step": 1770
1386
+ },
1387
+ {
1388
+ "epoch": 7.12,
1389
+ "grad_norm": 0.6615055799484253,
1390
+ "learning_rate": 2.326763155650744e-05,
1391
+ "loss": 0.1265331983566284,
1392
+ "step": 1780
1393
+ },
1394
+ {
1395
+ "epoch": 7.16,
1396
+ "grad_norm": 0.8998573422431946,
1397
+ "learning_rate": 2.2680282663286552e-05,
1398
+ "loss": 0.12731509208679198,
1399
+ "step": 1790
1400
+ },
1401
+ {
1402
+ "epoch": 7.2,
1403
+ "grad_norm": 0.808588981628418,
1404
+ "learning_rate": 2.209825980309151e-05,
1405
+ "loss": 0.13114826679229735,
1406
+ "step": 1800
1407
+ },
1408
+ {
1409
+ "epoch": 7.2,
1410
+ "eval_loss": 0.5847110748291016,
1411
+ "eval_runtime": 18.9921,
1412
+ "eval_samples_per_second": 21.061,
1413
+ "eval_steps_per_second": 3.528,
1414
+ "step": 1800
1415
+ },
1416
+ {
1417
+ "epoch": 7.24,
1418
+ "grad_norm": 0.951817512512207,
1419
+ "learning_rate": 2.152167644243213e-05,
1420
+ "loss": 0.12906957864761354,
1421
+ "step": 1810
1422
+ },
1423
+ {
1424
+ "epoch": 7.28,
1425
+ "grad_norm": 0.8695458173751831,
1426
+ "learning_rate": 2.095064498737701e-05,
1427
+ "loss": 0.133590030670166,
1428
+ "step": 1820
1429
+ },
1430
+ {
1431
+ "epoch": 7.32,
1432
+ "grad_norm": 0.7357354760169983,
1433
+ "learning_rate": 2.0385276761639765e-05,
1434
+ "loss": 0.13653848171234131,
1435
+ "step": 1830
1436
+ },
1437
+ {
1438
+ "epoch": 7.36,
1439
+ "grad_norm": 0.7873698472976685,
1440
+ "learning_rate": 1.9825681984876172e-05,
1441
+ "loss": 0.12472724914550781,
1442
+ "step": 1840
1443
+ },
1444
+ {
1445
+ "epoch": 7.4,
1446
+ "grad_norm": 0.873921811580658,
1447
+ "learning_rate": 1.9271969751196776e-05,
1448
+ "loss": 0.13255125284194946,
1449
+ "step": 1850
1450
+ },
1451
+ {
1452
+ "epoch": 7.44,
1453
+ "grad_norm": 0.7591536045074463,
1454
+ "learning_rate": 1.8724248007898647e-05,
1455
+ "loss": 0.13693161010742189,
1456
+ "step": 1860
1457
+ },
1458
+ {
1459
+ "epoch": 7.48,
1460
+ "grad_norm": 1.0509488582611084,
1461
+ "learning_rate": 1.8182623534420907e-05,
1462
+ "loss": 0.13425672054290771,
1463
+ "step": 1870
1464
+ },
1465
+ {
1466
+ "epoch": 7.52,
1467
+ "grad_norm": 0.8472399711608887,
1468
+ "learning_rate": 1.76472019215278e-05,
1469
+ "loss": 0.13668575286865234,
1470
+ "step": 1880
1471
+ },
1472
+ {
1473
+ "epoch": 7.5600000000000005,
1474
+ "grad_norm": 0.911901593208313,
1475
+ "learning_rate": 1.7118087550723633e-05,
1476
+ "loss": 0.1317702889442444,
1477
+ "step": 1890
1478
+ },
1479
+ {
1480
+ "epoch": 7.6,
1481
+ "grad_norm": 0.9731144309043884,
1482
+ "learning_rate": 1.659538357390341e-05,
1483
+ "loss": 0.14458621740341188,
1484
+ "step": 1900
1485
+ },
1486
+ {
1487
+ "epoch": 7.6,
1488
+ "eval_loss": 0.5830516219139099,
1489
+ "eval_runtime": 18.7747,
1490
+ "eval_samples_per_second": 21.305,
1491
+ "eval_steps_per_second": 3.569,
1492
+ "step": 1900
1493
+ },
1494
+ {
1495
+ "epoch": 7.64,
1496
+ "grad_norm": 0.5515460968017578,
1497
+ "learning_rate": 1.60791918932431e-05,
1498
+ "loss": 0.13126691579818725,
1499
+ "step": 1910
1500
+ },
1501
+ {
1502
+ "epoch": 7.68,
1503
+ "grad_norm": 0.7286776304244995,
1504
+ "learning_rate": 1.556961314133359e-05,
1505
+ "loss": 0.12600460052490234,
1506
+ "step": 1920
1507
+ },
1508
+ {
1509
+ "epoch": 7.72,
1510
+ "grad_norm": 0.95229572057724,
1511
+ "learning_rate": 1.5066746661562253e-05,
1512
+ "loss": 0.12453792095184327,
1513
+ "step": 1930
1514
+ },
1515
+ {
1516
+ "epoch": 7.76,
1517
+ "grad_norm": 0.7712796330451965,
1518
+ "learning_rate": 1.4570690488745687e-05,
1519
+ "loss": 0.14839541912078857,
1520
+ "step": 1940
1521
+ },
1522
+ {
1523
+ "epoch": 7.8,
1524
+ "grad_norm": 0.8011840581893921,
1525
+ "learning_rate": 1.4081541330017705e-05,
1526
+ "loss": 0.1321096420288086,
1527
+ "step": 1950
1528
+ },
1529
+ {
1530
+ "epoch": 7.84,
1531
+ "grad_norm": 0.936607301235199,
1532
+ "learning_rate": 1.3599394545975951e-05,
1533
+ "loss": 0.1317069411277771,
1534
+ "step": 1960
1535
+ },
1536
+ {
1537
+ "epoch": 7.88,
1538
+ "grad_norm": 0.9034994840621948,
1539
+ "learning_rate": 1.312434413209131e-05,
1540
+ "loss": 0.13362932205200195,
1541
+ "step": 1970
1542
+ },
1543
+ {
1544
+ "epoch": 7.92,
1545
+ "grad_norm": 0.9586318731307983,
1546
+ "learning_rate": 1.2656482700383237e-05,
1547
+ "loss": 0.12677763700485228,
1548
+ "step": 1980
1549
+ },
1550
+ {
1551
+ "epoch": 7.96,
1552
+ "grad_norm": 0.9358674883842468,
1553
+ "learning_rate": 1.219590146136485e-05,
1554
+ "loss": 0.1382434129714966,
1555
+ "step": 1990
1556
+ },
1557
+ {
1558
+ "epoch": 8.0,
1559
+ "grad_norm": 0.8410677313804626,
1560
+ "learning_rate": 1.1742690206261292e-05,
1561
+ "loss": 0.12519369125366211,
1562
+ "step": 2000
1563
+ },
1564
+ {
1565
+ "epoch": 8.0,
1566
+ "eval_loss": 0.5840195417404175,
1567
+ "eval_runtime": 18.625,
1568
+ "eval_samples_per_second": 21.477,
1569
+ "eval_steps_per_second": 3.597,
1570
+ "step": 2000
1571
+ },
1572
+ {
1573
+ "epoch": 8.04,
1574
+ "grad_norm": 0.6319883465766907,
1575
+ "learning_rate": 1.129693728950474e-05,
1576
+ "loss": 0.10409053564071655,
1577
+ "step": 2010
1578
+ },
1579
+ {
1580
+ "epoch": 8.08,
1581
+ "grad_norm": 0.7751646041870117,
1582
+ "learning_rate": 1.0858729611509516e-05,
1583
+ "loss": 0.10310100317001343,
1584
+ "step": 2020
1585
+ },
1586
+ {
1587
+ "epoch": 8.12,
1588
+ "grad_norm": 0.9277542233467102,
1589
+ "learning_rate": 1.0428152601730718e-05,
1590
+ "loss": 0.09960774183273316,
1591
+ "step": 2030
1592
+ },
1593
+ {
1594
+ "epoch": 8.16,
1595
+ "grad_norm": 0.8381429314613342,
1596
+ "learning_rate": 1.0005290202009531e-05,
1597
+ "loss": 0.09982571601867676,
1598
+ "step": 2040
1599
+ },
1600
+ {
1601
+ "epoch": 8.2,
1602
+ "grad_norm": 0.7726228833198547,
1603
+ "learning_rate": 9.590224850208646e-06,
1604
+ "loss": 0.11322143077850341,
1605
+ "step": 2050
1606
+ },
1607
+ {
1608
+ "epoch": 8.24,
1609
+ "grad_norm": 0.7724836468696594,
1610
+ "learning_rate": 9.183037464140804e-06,
1611
+ "loss": 0.10006082057952881,
1612
+ "step": 2060
1613
+ },
1614
+ {
1615
+ "epoch": 8.28,
1616
+ "grad_norm": 1.0587371587753296,
1617
+ "learning_rate": 8.783807425793721e-06,
1618
+ "loss": 0.11560235023498536,
1619
+ "step": 2070
1620
+ },
1621
+ {
1622
+ "epoch": 8.32,
1623
+ "grad_norm": 0.8337858319282532,
1624
+ "learning_rate": 8.392612565854375e-06,
1625
+ "loss": 0.10931503772735596,
1626
+ "step": 2080
1627
+ },
1628
+ {
1629
+ "epoch": 8.36,
1630
+ "grad_norm": 0.805338978767395,
1631
+ "learning_rate": 8.009529148535855e-06,
1632
+ "loss": 0.10900030136108399,
1633
+ "step": 2090
1634
+ },
1635
+ {
1636
+ "epoch": 8.4,
1637
+ "grad_norm": 0.7612441182136536,
1638
+ "learning_rate": 7.63463185670939e-06,
1639
+ "loss": 0.1069128155708313,
1640
+ "step": 2100
1641
+ },
1642
+ {
1643
+ "epoch": 8.4,
1644
+ "eval_loss": 0.6247864961624146,
1645
+ "eval_runtime": 18.281,
1646
+ "eval_samples_per_second": 21.881,
1647
+ "eval_steps_per_second": 3.665,
1648
+ "step": 2100
1649
+ },
1650
+ {
1651
+ "epoch": 8.44,
1652
+ "grad_norm": 0.8081948757171631,
1653
+ "learning_rate": 7.267993777344856e-06,
1654
+ "loss": 0.09856721758842468,
1655
+ "step": 2110
1656
+ },
1657
+ {
1658
+ "epoch": 8.48,
1659
+ "grad_norm": 0.7861329913139343,
1660
+ "learning_rate": 6.909686387262254e-06,
1661
+ "loss": 0.10609345436096192,
1662
+ "step": 2120
1663
+ },
1664
+ {
1665
+ "epoch": 8.52,
1666
+ "grad_norm": 0.7145861387252808,
1667
+ "learning_rate": 6.559779539197231e-06,
1668
+ "loss": 0.105103600025177,
1669
+ "step": 2130
1670
+ },
1671
+ {
1672
+ "epoch": 8.56,
1673
+ "grad_norm": 0.7359808683395386,
1674
+ "learning_rate": 6.21834144818314e-06,
1675
+ "loss": 0.10853493213653564,
1676
+ "step": 2140
1677
+ },
1678
+ {
1679
+ "epoch": 8.6,
1680
+ "grad_norm": 0.8519245982170105,
1681
+ "learning_rate": 5.885438678252342e-06,
1682
+ "loss": 0.11464111804962158,
1683
+ "step": 2150
1684
+ },
1685
+ {
1686
+ "epoch": 8.64,
1687
+ "grad_norm": 0.8307661414146423,
1688
+ "learning_rate": 5.5611361294594325e-06,
1689
+ "loss": 0.10765299797058106,
1690
+ "step": 2160
1691
+ },
1692
+ {
1693
+ "epoch": 8.68,
1694
+ "grad_norm": 0.8340169787406921,
1695
+ "learning_rate": 5.245497025228874e-06,
1696
+ "loss": 0.10699164867401123,
1697
+ "step": 2170
1698
+ },
1699
+ {
1700
+ "epoch": 8.72,
1701
+ "grad_norm": 0.7895165085792542,
1702
+ "learning_rate": 4.938582900029437e-06,
1703
+ "loss": 0.10728691816329956,
1704
+ "step": 2180
1705
+ },
1706
+ {
1707
+ "epoch": 8.76,
1708
+ "grad_norm": 0.7967789769172668,
1709
+ "learning_rate": 4.640453587377957e-06,
1710
+ "loss": 0.11177785396575927,
1711
+ "step": 2190
1712
+ },
1713
+ {
1714
+ "epoch": 8.8,
1715
+ "grad_norm": 0.8613453507423401,
1716
+ "learning_rate": 4.351167208174639e-06,
1717
+ "loss": 0.11041848659515381,
1718
+ "step": 2200
1719
+ },
1720
+ {
1721
+ "epoch": 8.8,
1722
+ "eval_loss": 0.6235533356666565,
1723
+ "eval_runtime": 19.0901,
1724
+ "eval_samples_per_second": 20.953,
1725
+ "eval_steps_per_second": 3.51,
1726
+ "step": 2200
1727
+ },
1728
+ {
1729
+ "epoch": 8.84,
1730
+ "grad_norm": 0.6587359309196472,
1731
+ "learning_rate": 4.0707801593723e-06,
1732
+ "loss": 0.1085782766342163,
1733
+ "step": 2210
1734
+ },
1735
+ {
1736
+ "epoch": 8.88,
1737
+ "grad_norm": 0.7126621603965759,
1738
+ "learning_rate": 3.799347102981665e-06,
1739
+ "loss": 0.11138873100280762,
1740
+ "step": 2220
1741
+ },
1742
+ {
1743
+ "epoch": 8.92,
1744
+ "grad_norm": 0.7560760974884033,
1745
+ "learning_rate": 3.536920955414885e-06,
1746
+ "loss": 0.10770895481109619,
1747
+ "step": 2230
1748
+ },
1749
+ {
1750
+ "epoch": 8.96,
1751
+ "grad_norm": 0.95421302318573,
1752
+ "learning_rate": 3.2835528771693992e-06,
1753
+ "loss": 0.11167995929718018,
1754
+ "step": 2240
1755
+ },
1756
+ {
1757
+ "epoch": 9.0,
1758
+ "grad_norm": 0.9774760007858276,
1759
+ "learning_rate": 3.039292262854088e-06,
1760
+ "loss": 0.11738998889923095,
1761
+ "step": 2250
1762
+ },
1763
+ {
1764
+ "epoch": 9.04,
1765
+ "grad_norm": 0.7680178880691528,
1766
+ "learning_rate": 2.804186731559677e-06,
1767
+ "loss": 0.10072145462036133,
1768
+ "step": 2260
1769
+ },
1770
+ {
1771
+ "epoch": 9.08,
1772
+ "grad_norm": 0.8222008943557739,
1773
+ "learning_rate": 2.5782821175753422e-06,
1774
+ "loss": 0.09228388667106628,
1775
+ "step": 2270
1776
+ },
1777
+ {
1778
+ "epoch": 9.12,
1779
+ "grad_norm": 0.8610215783119202,
1780
+ "learning_rate": 2.361622461453178e-06,
1781
+ "loss": 0.09626876711845397,
1782
+ "step": 2280
1783
+ },
1784
+ {
1785
+ "epoch": 9.16,
1786
+ "grad_norm": 0.7807718515396118,
1787
+ "learning_rate": 2.154250001422431e-06,
1788
+ "loss": 0.0960278868675232,
1789
+ "step": 2290
1790
+ },
1791
+ {
1792
+ "epoch": 9.2,
1793
+ "grad_norm": 0.8036084175109863,
1794
+ "learning_rate": 1.956205165155078e-06,
1795
+ "loss": 0.0941778838634491,
1796
+ "step": 2300
1797
+ },
1798
+ {
1799
+ "epoch": 9.2,
1800
+ "eval_loss": 0.6419874429702759,
1801
+ "eval_runtime": 19.9334,
1802
+ "eval_samples_per_second": 20.067,
1803
+ "eval_steps_per_second": 3.361,
1804
+ "step": 2300
1805
+ },
1806
+ {
1807
+ "epoch": 9.24,
1808
+ "grad_norm": 0.7480472326278687,
1809
+ "learning_rate": 1.7675265618843362e-06,
1810
+ "loss": 0.09725146293640137,
1811
+ "step": 2310
1812
+ },
1813
+ {
1814
+ "epoch": 9.28,
1815
+ "grad_norm": 0.8559448719024658,
1816
+ "learning_rate": 1.5882509748777808e-06,
1817
+ "loss": 0.09353782534599304,
1818
+ "step": 2320
1819
+ },
1820
+ {
1821
+ "epoch": 9.32,
1822
+ "grad_norm": 0.6416171193122864,
1823
+ "learning_rate": 1.4184133542663014e-06,
1824
+ "loss": 0.09848537445068359,
1825
+ "step": 2330
1826
+ },
1827
+ {
1828
+ "epoch": 9.36,
1829
+ "grad_norm": 0.7388947606086731,
1830
+ "learning_rate": 1.258046810230562e-06,
1831
+ "loss": 0.10164464712142944,
1832
+ "step": 2340
1833
+ },
1834
+ {
1835
+ "epoch": 9.4,
1836
+ "grad_norm": 0.8187626600265503,
1837
+ "learning_rate": 1.1071826065460588e-06,
1838
+ "loss": 0.0934177041053772,
1839
+ "step": 2350
1840
+ },
1841
+ {
1842
+ "epoch": 9.44,
1843
+ "grad_norm": 0.865635871887207,
1844
+ "learning_rate": 9.65850154488218e-07,
1845
+ "loss": 0.1012031078338623,
1846
+ "step": 2360
1847
+ },
1848
+ {
1849
+ "epoch": 9.48,
1850
+ "grad_norm": 0.8829763531684875,
1851
+ "learning_rate": 8.340770070986214e-07,
1852
+ "loss": 0.09371918439865112,
1853
+ "step": 2370
1854
+ },
1855
+ {
1856
+ "epoch": 9.52,
1857
+ "grad_norm": 0.7734853625297546,
1858
+ "learning_rate": 7.11888853813436e-07,
1859
+ "loss": 0.09450345039367676,
1860
+ "step": 2380
1861
+ },
1862
+ {
1863
+ "epoch": 9.56,
1864
+ "grad_norm": 0.7692961096763611,
1865
+ "learning_rate": 5.993095154552431e-07,
1866
+ "loss": 0.09499152898788452,
1867
+ "step": 2390
1868
+ },
1869
+ {
1870
+ "epoch": 9.6,
1871
+ "grad_norm": 1.1678398847579956,
1872
+ "learning_rate": 4.963609395891299e-07,
1873
+ "loss": 0.10716021060943604,
1874
+ "step": 2400
1875
+ },
1876
+ {
1877
+ "epoch": 9.6,
1878
+ "eval_loss": 0.6402375102043152,
1879
+ "eval_runtime": 18.9858,
1880
+ "eval_samples_per_second": 21.068,
1881
+ "eval_steps_per_second": 3.529,
1882
+ "step": 2400
1883
+ },
1884
+ {
1885
+ "epoch": 9.64,
1886
+ "grad_norm": 0.7258604764938354,
1887
+ "learning_rate": 4.030631962439302e-07,
1888
+ "loss": 0.09596163630485535,
1889
+ "step": 2410
1890
+ },
1891
+ {
1892
+ "epoch": 9.68,
1893
+ "grad_norm": 0.8662357330322266,
1894
+ "learning_rate": 3.1943447399958027e-07,
1895
+ "loss": 0.09645589590072631,
1896
+ "step": 2420
1897
+ },
1898
+ {
1899
+ "epoch": 9.72,
1900
+ "grad_norm": 0.8258174061775208,
1901
+ "learning_rate": 2.4549107644117885e-07,
1902
+ "loss": 0.09415926933288574,
1903
+ "step": 2430
1904
+ },
1905
+ {
1906
+ "epoch": 9.76,
1907
+ "grad_norm": 0.911540150642395,
1908
+ "learning_rate": 1.8124741898058462e-07,
1909
+ "loss": 0.10026730298995971,
1910
+ "step": 2440
1911
+ },
1912
+ {
1913
+ "epoch": 9.8,
1914
+ "grad_norm": 0.8336577415466309,
1915
+ "learning_rate": 1.267160260461253e-07,
1916
+ "loss": 0.09711679220199584,
1917
+ "step": 2450
1918
+ },
1919
+ {
1920
+ "epoch": 9.84,
1921
+ "grad_norm": 0.7324675917625427,
1922
+ "learning_rate": 8.190752864088436e-08,
1923
+ "loss": 0.09345818758010864,
1924
+ "step": 2460
1925
+ },
1926
+ {
1927
+ "epoch": 9.88,
1928
+ "grad_norm": 0.9261553287506104,
1929
+ "learning_rate": 4.683066227023081e-08,
1930
+ "loss": 0.102751624584198,
1931
+ "step": 2470
1932
+ },
1933
+ {
1934
+ "epoch": 9.92,
1935
+ "grad_norm": 0.9403973817825317,
1936
+ "learning_rate": 2.1492265238748366e-08,
1937
+ "loss": 0.0988599717617035,
1938
+ "step": 2480
1939
+ },
1940
+ {
1941
+ "epoch": 9.96,
1942
+ "grad_norm": 0.7062044739723206,
1943
+ "learning_rate": 5.897277317157279e-09,
1944
+ "loss": 0.09828301668167114,
1945
+ "step": 2490
1946
+ },
1947
+ {
1948
+ "epoch": 10.0,
1949
+ "grad_norm": 0.7819132804870605,
1950
+ "learning_rate": 4.873877924582715e-11,
1951
+ "loss": 0.0937616467475891,
1952
+ "step": 2500
1953
+ },
1954
+ {
1955
+ "epoch": 10.0,
1956
+ "eval_loss": 0.6409608721733093,
1957
+ "eval_runtime": 17.8761,
1958
+ "eval_samples_per_second": 22.376,
1959
+ "eval_steps_per_second": 3.748,
1960
+ "step": 2500
1961
+ },
1962
+ {
1963
+ "epoch": 10.0,
1964
+ "step": 2500,
1965
+ "total_flos": 3.634151342457697e+19,
1966
+ "train_loss": 0.2690703985452652,
1967
+ "train_runtime": 10014.7733,
1968
+ "train_samples_per_second": 5.991,
1969
+ "train_steps_per_second": 0.25
1970
+ }
1971
+ ],
1972
+ "logging_steps": 10,
1973
+ "max_steps": 2500,
1974
+ "num_input_tokens_seen": 0,
1975
+ "num_train_epochs": 10,
1976
+ "save_steps": 100,
1977
+ "stateful_callbacks": {
1978
+ "TrainerControl": {
1979
+ "args": {
1980
+ "should_epoch_stop": false,
1981
+ "should_evaluate": false,
1982
+ "should_log": false,
1983
+ "should_save": true,
1984
+ "should_training_stop": true
1985
+ },
1986
+ "attributes": {}
1987
+ }
1988
+ },
1989
+ "total_flos": 3.634151342457697e+19,
1990
+ "train_batch_size": 1,
1991
+ "trial_name": null,
1992
+ "trial_params": null
1993
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5228cdd5eb9358c1a0239811495149912109515f66a9f22386e040bdf16b1dd0
3
+ size 5713
training_eval_loss.png ADDED
training_loss.png ADDED