| | --- |
| | license: mit |
| | datasets: |
| | - tiiuae/falcon-refinedweb |
| | - HuggingFaceFW/fineweb |
| | base_model: |
| | - cckm/tinymistral_950m |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | library_name: PyTorch |
| | --- |
| | |
| | ## A deep and narrow Mistral model (950M params) |
| | This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications. |
| | |
| | It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning. |
| | |
| | LM Harness numbers: |
| | | Benchmark | Result | |
| | | ----- | ----- | |
| | | arc_c | 0.2884 | |
| | | arc_e | 0.5139 | |
| | | boolq | 0.6089 | |
| | | hellaswag | 0.5888 | |
| | | obqa | 0.3280 | |
| | | piqa | 0.7388 | |
| | | siqa | 0.4038 | |
| | | wino | 0.5627 | |