Spaces:

lil58
/

interview

Running

Lee93whut Lee93whut commited on 1 day ago

Commit

8eeeb67

1 Parent(s): e1ecae1

docs: drop R4-续 retrospective paragraph — it reintroduced the wrong shaping=0.5 value

The paragraph at the end of the R3-收尾 block (just before the "Round 4"
section header) was added in 8c017b9 to clarify that the R4-续 algorithm
ablation was a second step taken after the double algorithm's ceiling
was confirmed, not part of the original R4 plan. It listed R4's
"verified hyperparameter set" as:
buffer=80k, target=1500, shaping=0.5, visited_map 4通道,
EVAL checkpoint, BFS

The shaping=0.5 number is factually wrong. Commit 92423f0 ("docs: clean
up R3/R4 record and consolidate technical narrative") established
that:
- config.yaml lists distance_shaping_alpha = 0.5
- train.py does not forward that field to MazeEnv()
- so R3/R4's effective shaping alpha was 0 throughout

Reintroducing shaping=0.5 in any document, even in a parenthetical list
of "verified hyperparameters", contradicts the project's documented
record. Rather than rewriting the paragraph to omit the wrong value
(which would also drop the "R4-续 was a second step" insight that the
paragraph was trying to convey), remove the whole paragraph and leave
the R3/R4 plan-vs-execution honesty to be re-added later with correct
numbers if needed.

No other changes; rest of the experiment_log retrospective is intact.

Co-Authored-By: Lee93whut <30529279@qq.com>

Files changed (1) hide show

docs/experiment_log.md +0 -2

docs/experiment_log.md CHANGED Viewed

@@ -504,8 +504,6 @@ if eval_success_rate > best_eval_success_rate:
 **预期**：R3 中 double 算法 EVAL 峰值达 84%，改保存策略后 Holdout 预期接近 80–84%（消除 10pp 保存时机损失，剩余 2–4pp 为评估集过拟合的正常偏差）。
-**关于"R4 续 / 算法横评阶段"**：R4 完成上述三项叠加后，确认 78% 是该算法（double）的上限；为判断"R4 还能不能继续往前推 6pp" 才有了"R4 续"——固定 R4 已验证有效的超参组合（buffer=80k、target=1500、shaping=0.5、visited_map 4通道、EVAL checkpoint、BFS），4 种算法各跑一次。这是"消融决定上限 → 用剩余 6pp 空间横评算法"的两步走，**不是 R3 结束后 R4 一开始就规划的内容**。
 ---
 ## Round 4 — 系统性问题修复：Checkpoint 策略 + 训练信号质量

 **预期**：R3 中 double 算法 EVAL 峰值达 84%，改保存策略后 Holdout 预期接近 80–84%（消除 10pp 保存时机损失，剩余 2–4pp 为评估集过拟合的正常偏差）。
 ---
 ## Round 4 — 系统性问题修复：Checkpoint 策略 + 训练信号质量