DistillSupra-0.2M


DistillSupra-0.2M is an ultra-compact causal language model with approximately 0.2 million parameters, produced by knowledge distillation from Supra-Mini-v4-2M.

It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.

The model was 10x compressed! That's crazy!

Architecture

Parameter Teacher Student
hidden_size 64 48
intermediate_size 128 96
num_hidden_layers 5 4
num_attention_heads 8 6
vocab_size 4096 4096
Parameters ~468k ~289k

Some outputs:

Prompt : Throughout history, great civilizations

Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m

Prompt : The human brain is capable of

Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as

Prompt : The most important principle in science is

The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,

Why did supra created this trash?

We are currently researching knowledge distillation and this was the first step! Things will better up!

Final Thought

Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!

Downloads last month
-
Safetensors
Model size
289k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SupraLabs/DistillSupra-0.2M

Finetuned
(1)
this model

Dataset used to train SupraLabs/DistillSupra-0.2M