|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- causal-lm |
|
|
- scientific-language-model |
|
|
- mathematics |
|
|
- arxiv |
|
|
- research |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# KiteFish-A1-1.5B |
|
|
|
|
|
**KiteFish-A1-1.5B** is a ~1.5B parameter decoder-only transformer trained from scratch on raw arXiv LaTeX sources across mathematics, computer science, and theoretical physics. |
|
|
|
|
|
📄 **Paper:** https://arxiv.org/abs/2602.17288 |
|
|
💻 **Github:** https://github.com/kitefishai/KiteFish-A1-1.5B-Math |
|
|
|
|
|
This is a **base scientific language model** (not instruction-tuned). |
|
|
|
|
|
## Overview |
|
|
|
|
|
KiteFish-A1-1.5B explores what it takes to train a domain-specialized scientific language model directly from structured LaTeX archives. |
|
|
|
|
|
**Training Scale** |
|
|
- ~52B pretraining tokens |
|
|
- ~5B additional post-training tokens |
|
|
- ~200GB processed scientific corpus |
|
|
- LLaMA-compatible tokenizer (~102k vocab) |
|
|
- 2× NVIDIA A100 (80GB) GPUs |
|
|
- 24 experimental training runs |
|
|
|
|
|
The focus of this project is *scientific language modeling robustness*, not benchmark optimization. |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- 24 Transformer layers |
|
|
- Hidden size: 2048 |
|
|
- FFN size: 5504 |
|
|
- 16 attention heads |
|
|
- Context length: 4096 (trained at 768 tokens) |
|
|
- Dense LLaMA-style architecture |
|
|
|
|
|
**Optimization** |
|
|
- AdamW |
|
|
- Learning rate: 2e-4 |
|
|
- Warmup: 500 steps |
|
|
- Weight decay: 0.1 |
|
|
- Gradient accumulation: 32 |
|
|
- bf16 mixed precision |
|
|
- Gradient checkpointing enabled |
|
|
|
|
|
**Validation Perplexity:** ~4.2 (held-out scientific corpus) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
KiteFish-A1-1.5B is suitable for: |
|
|
|
|
|
- Scientific text modeling research |
|
|
- Mathematical language modeling experiments |
|
|
- Pretraining initialization for domain fine-tuning |
|
|
- Tokenization and symbolic modeling research |
|
|
- Studying LaTeX structure modeling |
|
|
|
|
|
It is **not optimized for:** |
|
|
|
|
|
- Instruction following |
|
|
- Chat-based applications |
|
|
- General conversational AI |
|
|
- Benchmark leaderboard performance |
|
|
|
|
|
## Performance Notes |
|
|
|
|
|
This model was trained under moderate compute constraints and without instruction tuning or alignment stages. |
|
|
|
|
|
Observed characteristics: |
|
|
|
|
|
- Strong familiarity with scientific writing style |
|
|
- Stable LaTeX structural modeling |
|
|
- Reasonable symbolic fluency |
|
|
- Limited reasoning depth |
|
|
- Low downstream benchmark accuracy without fine-tuning |
|
|
|
|
|
Performance improves significantly with supervised fine-tuning (SFT), LoRA adaptation, or domain-specific instruction tuning. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Not instruction-tuned |
|
|
- No RLHF or preference alignment |
|
|
- Trained at 768-token sequence length |
|
|
- Domain restricted to selected arXiv categories |
|
|
- Not optimized for reasoning benchmarks |
|
|
- General NLP benchmark scores may be low |
|
|
|
|
|
This release is intended primarily for research and experimentation. |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
model_id = "KiteFishAI/KiteFish-A1-1.5B-Math" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
|
|
prompt = "Prove that the sum of two continuous functions is continuous." |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
``` |
|
|
@article{kitefish_a1_2026, |
|
|
title={KiteFish-A1: Training a Scientific Language Model from Raw LaTeX Archives}, |
|
|
author={...}, |
|
|
year={2026}, |
|
|
eprint={2602.17288}, |
|
|
archivePrefix={arXiv} |
|
|
} |
|
|
``` |
|
|
|