Johnblick187 commited on
Commit
4f864d1
·
verified ·
1 Parent(s): ec57f1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ SmartCoderMoE is a 4.65B parameter sparse Mixture-of-Experts coding model.
25
  SmartCoderMoE is not your average fine-tune. He was engineered through a multi-stage weight surgery pipeline:
26
 
27
  1. **Slice Merge** — StarCoder2-15B and StarChat2-15B were each sliced into 3 × 2048-dim pieces and SLERP-merged with deliberate per-slice biases (60/80/90) to preserve coding depth while injecting instruct capability of Starchat2
28
- 1. **MoE Surgery** — Every dense FFN layer was surgically split: The original dim of 245576 was reduced to an intermediate dim of 8192 and kept as a dense FFN, and the remaining 16384 dims were sliced into **32 experts of 512 dim each**, giving Smartcoder an expansive yet tiny network of 1280 total experts.
29
  1. **Vocab Expansion** — Extended from 49152 to 65536 tokens with multimodal special tokens for code, audio, image, video, and music.
30
  1. **Zero waste** — Not a single weight was discarded. Every parameter from StarCoder2’s original FFN lives on in either the dense FFN or one of the 1280 expert slots.
31
 
 
25
  SmartCoderMoE is not your average fine-tune. He was engineered through a multi-stage weight surgery pipeline:
26
 
27
  1. **Slice Merge** — StarCoder2-15B and StarChat2-15B were each sliced into 3 × 2048-dim pieces and SLERP-merged with deliberate per-slice biases (60/80/90) to preserve coding depth while injecting instruct capability of Starchat2
28
+ 1. **MoE Surgery** — Every dense FFN layer was surgically split: The original dim of 24576 was reduced to an intermediate dim of 8192 and kept as a dense FFN, and the remaining 16384 dims were sliced into **32 experts of 512 dim each**, giving Smartcoder an expansive yet tiny network of 1280 total experts.
29
  1. **Vocab Expansion** — Extended from 49152 to 65536 tokens with multimodal special tokens for code, audio, image, video, and music.
30
  1. **Zero waste** — Not a single weight was discarded. Every parameter from StarCoder2’s original FFN lives on in either the dense FFN or one of the 1280 expert slots.
31