TensorMind (0.5B)
TensorMind is a 536.9M-parameter causal language model for lightweight Chinese/English text generation.
Model Details
- Architecture: Decoder-only Transformer (
TensorMindForCausalLM) - Layers: 32
- Hidden size: 1024
- Heads / KV heads: 16 / 8 (GQA)
- Context length: 32,768
- Vocab size: 32,768
- Positional encoding: RoPE
- Activation: SiLU
- Parameters: 536,941,568 (~0.5B)
Quick Start
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "TensorMind/TensorMind"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
)
prompt = "请用三句话介绍一下你自己。"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Benchmark Snapshot
Evaluation time: 2026-03-07 00:40 (UTC+8), zero-shot (n-shot=0).
| Model | Params | C-Eval | CMMLU | A-CLUE | TMMLU+ | AGIEval |
|---|---|---|---|---|---|---|
| TensorMind | 0.5B | 27.27 | 25.26 | 25.43 | 24.96 | 33.56 |
Intended Use
- Lightweight chat and text generation
- Local experimentation and teaching
- Baseline model for research and fine-tuning
Limitations
- This is a small model and can produce factual errors.
- Benchmark numbers above are from multiple-choice style evaluations and do not fully represent open-ended generation quality.
- Outputs may contain bias or unsafe content; apply filtering for production use.
License
MIT License.
- Downloads last month
- 15
Evaluation results
- C-Eval (0-shot) on C-Evalself-reported27.270
- CMMLU (0-shot) on CMMLUself-reported25.260
- A-CLUE (0-shot) on A-CLUEself-reported25.430
- TMMLU+ (0-shot) on TMMLU+self-reported24.960

