Johnblick187 commited on
Commit
f10d44f
·
verified ·
1 Parent(s): 5fc333f

Update modeling_smartcoder_moe.py

Browse files
Files changed (1) hide show
  1. modeling_smartcoder_moe.py +14 -16
modeling_smartcoder_moe.py CHANGED
@@ -1,19 +1,17 @@
1
- ""
2
- modeling_smartcoder_moe.py
3
- Custom model class for SmartCoderMoE.
4
-
5
- Architecture (from tensor inspection):
6
- - vocab_size: 65536, hidden: 2048, layers: 40
7
- - Attention: q[2048,2048], k/v[512,2048] - 16 heads, 4 KV heads, head_dim=128
8
- - MLP (hybrid dense + MoE):
9
- dense_fc: [8192, 2048] up
10
- dense_proj: [2048, 8192] down
11
- experts_fc: [32, 512, 2048] expert up (batched)
12
- experts_proj: [32, 2048, 512] expert down (batched)
13
- router: [32, 2048] router logits
14
- - LayerNorm: weight+bias (input_layernorm, post_attention_layernorm)
15
- - Final norm: model.norm.weight/bias
16
- ""
17
 
18
  import math
19
  import torch
 
1
+
2
+ # modeling_smartcoder_moe.py
3
+
4
+ #Architecture (from tensor inspection):
5
+ #- vocab_size: 65536, hidden: 2048, layers: 40
6
+ #- Attention: q[2048,2048], k/v[512,2048] - 16 heads, 4 KV heads, head_dim=128
7
+ #- MLP (hybrid dense + MoE):
8
+ # dense_fc: [8192, 2048] up
9
+ # dense_proj: [2048, 8192] down
10
+ # experts_fc: [32, 512, 2048] expert up (batched)
11
+ # experts_proj: [32, 2048, 512] expert down (batched)
12
+ # router: [32, 2048] router logits
13
+ #- LayerNorm: weight+bias (input_layernorm, post_attention_layernorm)
14
+ #- Final norm: model.norm.weight/bias
 
 
15
 
16
  import math
17
  import torch