Lee93whut Lee93whut commited on
Commit
8eeeb67
·
1 Parent(s): e1ecae1

docs: drop R4-续 retrospective paragraph — it reintroduced the wrong shaping=0.5 value

Browse files

The paragraph at the end of the R3-收尾 block (just before the "Round 4"
section header) was added in 8c017b9 to clarify that the R4-续 algorithm
ablation was a second step taken after the double algorithm's ceiling
was confirmed, not part of the original R4 plan. It listed R4's
"verified hyperparameter set" as:
buffer=80k, target=1500, shaping=0.5, visited_map 4通道,
EVAL checkpoint, BFS

The shaping=0.5 number is factually wrong. Commit 92423f0 ("docs: clean
up R3/R4 record and consolidate technical narrative") established
that:
- config.yaml lists distance_shaping_alpha = 0.5
- train.py does not forward that field to MazeEnv()
- so R3/R4's effective shaping alpha was 0 throughout

Reintroducing shaping=0.5 in any document, even in a parenthetical list
of "verified hyperparameters", contradicts the project's documented
record. Rather than rewriting the paragraph to omit the wrong value
(which would also drop the "R4-续 was a second step" insight that the
paragraph was trying to convey), remove the whole paragraph and leave
the R3/R4 plan-vs-execution honesty to be re-added later with correct
numbers if needed.

No other changes; rest of the experiment_log retrospective is intact.

Co-Authored-By: Lee93whut <30529279@qq.com>

Files changed (1) hide show
  1. docs/experiment_log.md +0 -2
docs/experiment_log.md CHANGED
@@ -504,8 +504,6 @@ if eval_success_rate > best_eval_success_rate:
504
 
505
  **预期**:R3 中 double 算法 EVAL 峰值达 84%,改保存策略后 Holdout 预期接近 80–84%(消除 10pp 保存时机损失,剩余 2–4pp 为评估集过拟合的正常偏差)。
506
 
507
- **关于"R4 续 / 算法横评阶段"**:R4 完成上述三项叠加后,确认 78% 是该算法(double)的上限;为判断"R4 还能不能继续往前推 6pp" 才有了"R4 续"——固定 R4 已验证有效的超参组合(buffer=80k、target=1500、shaping=0.5、visited_map 4通道、EVAL checkpoint、BFS),4 种算法各跑一次。这是"消融决定上限 → 用剩余 6pp 空间横评算法"的两步走,**不是 R3 结束后 R4 一开始就规划的内容**。
508
-
509
  ---
510
 
511
  ## Round 4 — 系统性问题修复:Checkpoint 策略 + 训练信号质量
 
504
 
505
  **预期**:R3 中 double 算法 EVAL 峰值达 84%,改保存策略后 Holdout 预期接近 80–84%(消除 10pp 保存时机损失,剩余 2–4pp 为评估集过拟合的正常偏差)。
506
 
 
 
507
  ---
508
 
509
  ## Round 4 — 系统性问题修复:Checkpoint 策略 + 训练信号质量