datasysdev commited on
Commit
cc1555b
·
verified ·
1 Parent(s): faa1ec5

Update logs/all32_d128_block.log

Browse files
Files changed (1) hide show
  1. logs/all32_d128_block.log +86 -0
logs/all32_d128_block.log ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from /root/.netrc.
2
+ wandb: Currently logged in as: dalletest123 to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
3
+ wandb: Tracking run with wandb version 0.26.1
4
+ wandb: Run data is saved locally in /tmp/sparse-attn-git/wandb/run-20260509_091956-hcukr8rw
5
+ wandb: Run `wandb offline` to turn off syncing.
6
+ wandb: Syncing run all32-d128-block-causal-reserve-0-1-2-35
7
+ wandb: ⭐️ View project at https://wandb.ai/dalletest123/ann-sparse
8
+ wandb: 🚀 View run at https://wandb.ai/dalletest123/ann-sparse/runs/hcukr8rw
9
+ Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
10
+ [config] preset=all32_d128_block run=all32-d128-block-causal-reserve-0-1-2-35 steps=1000 layers=32 d_search=128 ckpt_dir=/tmp/checkpoints_all32_d128_block_reserve_0_1_2_35
11
+ Training search projections for layers: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
12
+ Reserved as full attention: [0, 1, 2, 35]
13
+ [perf] flash_attention_3 unavailable (ImportError: FlashAttention3 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention3 doesn't seem to be installed.); trying next.
14
+ [perf] flash_attention_2 unavailable (ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package for FlashAttention2 doesn't seem to be installed.); trying next.
15
+
16
+ [perf] attention implementation: sdpa
17
+ [perf] Liger kernels applied via apply_liger_kernel_to_qwen3.
18
+ Trainable parameters: 20,971,520 (20.97M)
19
+ [ckpt] step 25 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_25.pt
20
+ [ckpt] step 50 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_50.pt
21
+ [ckpt] step 75 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_75.pt
22
+ [ckpt] step 100 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_100.pt
23
+ [ckpt] step 125 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_125.pt
24
+ [ckpt] step 150 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_150.pt
25
+ [ckpt] step 175 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_175.pt
26
+ [ckpt] step 200 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_200.pt
27
+ [ckpt] step 225 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_225.pt
28
+
29
+ [step 250] === ACTIONABLE DIAGNOSTIC ===
30
+ Recall@K_eval: 0.812
31
+ PPL gap (relative): 2.283%
32
+ >> WORKING: High recall and quality preserved. Both set and ranking aligned with the teacher.
33
+
34
+ [ckpt] step 250 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_250.pt
35
+ [ckpt] step 275 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_275.pt
36
+ [ckpt] step 300 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_300.pt
37
+ [ckpt] step 325 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_325.pt
38
+ [ckpt] step 350 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_350.pt
39
+ [ckpt] step 375 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_375.pt
40
+ [ckpt] step 400 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_400.pt
41
+ [ckpt] step 425 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_425.pt
42
+ [ckpt] step 450 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_450.pt
43
+ [ckpt] step 475 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_475.pt
44
+
45
+ [step 500] === ACTIONABLE DIAGNOSTIC ===
46
+ Recall@K_eval: 0.823
47
+ PPL gap (relative): 1.753%
48
+ >> WORKING: High recall and quality preserved. Both set and ranking aligned with the teacher.
49
+
50
+ [ckpt] step 500 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_500.pt
51
+ [ckpt] step 525 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_525.pt
52
+ [ckpt] step 550 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_550.pt
53
+ [ckpt] step 575 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_575.pt
54
+ [ckpt] step 600 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_600.pt
55
+ [ckpt] step 625 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_625.pt
56
+ [ckpt] step 650 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_650.pt
57
+ [ckpt] step 675 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_675.pt
58
+ [ckpt] step 700 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_700.pt
59
+ [ckpt] step 725 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_725.pt
60
+
61
+ [step 750] === ACTIONABLE DIAGNOSTIC ===
62
+ Recall@K_eval: 0.825
63
+ PPL gap (relative): 1.943%
64
+ >> WORKING: High recall and quality preserved. Both set and ranking aligned with the teacher.
65
+
66
+ [ckpt] step 750 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_750.pt
67
+ [ckpt] step 775 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_775.pt
68
+ [ckpt] step 800 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_800.pt
69
+ [ckpt] step 825 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_825.pt
70
+ [ckpt] step 850 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_850.pt
71
+ [ckpt] step 875 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_875.pt
72
+ [ckpt] step 900 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_900.pt
73
+ [ckpt] step 925 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_925.pt
74
+ [ckpt] step 950 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_950.pt
75
+ [ckpt] step 975 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_975.pt
76
+
77
+ [step 1000] === ACTIONABLE DIAGNOSTIC ===
78
+ Recall@K_eval: 0.825
79
+ PPL gap (relative): 1.746%
80
+ >> WORKING: High recall and quality preserved. Both set and ranking aligned with the teacher.
81
+
82
+ [ckpt] step 1000 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_1000.pt
83
+ [ckpt] step 1000 -> /tmp/checkpoints_all32_d128_block_reserve_0_1_2_35/search_step_1000.pt
84
+ wandb:
85
+ wandb: 🚀 View run all32-d128-block-causal-reserve-0-1-2-35 at: https://wandb.ai/dalletest123/ann-sparse/runs/hcukr8rw
86
+ wandb: Find logs at: wandb/run-20260509_091956-hcukr8rw/logs