DistillSupra-0.2M
DistillSupra-0.2M is an ultra-compact causal language model with approximately 0.2 million parameters, produced by knowledge distillation from Supra-Mini-v4-2M.
It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.
The model was 10x compressed! That's crazy!
Architecture
| Parameter | Teacher | Student |
|---|---|---|
| hidden_size | 64 | 48 |
| intermediate_size | 128 | 96 |
| num_hidden_layers | 5 | 4 |
| num_attention_heads | 8 | 6 |
| vocab_size | 4096 | 4096 |
| Parameters | ~468k | ~289k |
Some outputs:
Prompt : Throughout history, great civilizations
Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m
Prompt : The human brain is capable of
Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as
Prompt : The most important principle in science is
The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,
Why did supra created this trash?
We are currently researching knowledge distillation and this was the first step! Things will better up!
Final Thought
Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!
- Downloads last month
- -
Model tree for SupraLabs/DistillSupra-0.2M
Base model
SupraLabs/Supra-Mini-v4-2M