DistillSupra-0.2M

DistillSupra-0.2M is an ultra-compact causal language model with approximately 0.2 million parameters, produced by knowledge distillation from Supra-Mini-v4-2M.

It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.

The model was 10x compressed! That's crazy!

Architecture

Parameter	Teacher	Student
hidden_size	64	48
intermediate_size	128	96
num_hidden_layers	5	4
num_attention_heads	8	6
vocab_size	4096	4096
Parameters	~468k	~289k

Some outputs:

Prompt : Throughout history, great civilizations

Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m

Prompt : The human brain is capable of

Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as

Prompt : The most important principle in science is

The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,

Why did supra created this trash?

We are currently researching knowledge distillation and this was the first step! Things will better up!

Final Thought

Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!

Downloads last month: -

Safetensors

Model size

289k params

Tensor type

F32

Model tree for SupraLabs/DistillSupra-0.2M

Base model

SupraLabs/Supra-Mini-v4-2M

Finetuned

(1)

this model

SupraLabs
/

DistillSupra-0.2M