docs: drop R4-续 retrospective paragraph — it reintroduced the wrong shaping=0.5 value
Browse filesThe paragraph at the end of the R3-收尾 block (just before the "Round 4"
section header) was added in 8c017b9 to clarify that the R4-续 algorithm
ablation was a second step taken after the double algorithm's ceiling
was confirmed, not part of the original R4 plan. It listed R4's
"verified hyperparameter set" as:
buffer=80k, target=1500, shaping=0.5, visited_map 4通道,
EVAL checkpoint, BFS
The shaping=0.5 number is factually wrong. Commit 92423f0 ("docs: clean
up R3/R4 record and consolidate technical narrative") established
that:
- config.yaml lists distance_shaping_alpha = 0.5
- train.py does not forward that field to MazeEnv()
- so R3/R4's effective shaping alpha was 0 throughout
Reintroducing shaping=0.5 in any document, even in a parenthetical list
of "verified hyperparameters", contradicts the project's documented
record. Rather than rewriting the paragraph to omit the wrong value
(which would also drop the "R4-续 was a second step" insight that the
paragraph was trying to convey), remove the whole paragraph and leave
the R3/R4 plan-vs-execution honesty to be re-added later with correct
numbers if needed.
No other changes; rest of the experiment_log retrospective is intact.
Co-Authored-By: Lee93whut <30529279@qq.com>
- docs/experiment_log.md +0 -2
|
@@ -504,8 +504,6 @@ if eval_success_rate > best_eval_success_rate:
|
|
| 504 |
|
| 505 |
**预期**:R3 中 double 算法 EVAL 峰值达 84%,改保存策略后 Holdout 预期接近 80–84%(消除 10pp 保存时机损失,剩余 2–4pp 为评估集过拟合的正常偏差)。
|
| 506 |
|
| 507 |
-
**关于"R4 续 / 算法横评阶段"**:R4 完成上述三项叠加后,确认 78% 是该算法(double)的上限;为判断"R4 还能不能继续往前推 6pp" 才有了"R4 续"——固定 R4 已验证有效的超参组合(buffer=80k、target=1500、shaping=0.5、visited_map 4通道、EVAL checkpoint、BFS),4 种算法各跑一次。这是"消融决定上限 → 用剩余 6pp 空间横评算法"的两步走,**不是 R3 结束后 R4 一开始就规划的内容**。
|
| 508 |
-
|
| 509 |
---
|
| 510 |
|
| 511 |
## Round 4 — 系统性问题修复:Checkpoint 策略 + 训练信号质量
|
|
|
|
| 504 |
|
| 505 |
**预期**:R3 中 double 算法 EVAL 峰值达 84%,改保存策略后 Holdout 预期接近 80–84%(消除 10pp 保存时机损失,剩余 2–4pp 为评估集过拟合的正常偏差)。
|
| 506 |
|
|
|
|
|
|
|
| 507 |
---
|
| 508 |
|
| 509 |
## Round 4 — 系统性问题修复:Checkpoint 策略 + 训练信号质量
|