| | --- |
| | license: mit |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - tensormind |
| | - causal-lm |
| | - text-generation |
| | - chinese |
| | - custom-code |
| | language: |
| | - zh |
| | - en |
| | model-index: |
| | - name: TensorMind |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Chinese Multiple-Choice Evaluation |
| | dataset: |
| | type: custom |
| | name: C-Eval |
| | metrics: |
| | - type: accuracy |
| | value: 27.27 |
| | name: C-Eval (0-shot) |
| | - task: |
| | type: text-generation |
| | name: Chinese Multiple-Choice Evaluation |
| | dataset: |
| | type: custom |
| | name: CMMLU |
| | metrics: |
| | - type: accuracy |
| | value: 25.26 |
| | name: CMMLU (0-shot) |
| | - task: |
| | type: text-generation |
| | name: Chinese Multiple-Choice Evaluation |
| | dataset: |
| | type: custom |
| | name: A-CLUE |
| | metrics: |
| | - type: accuracy |
| | value: 25.43 |
| | name: A-CLUE (0-shot) |
| | - task: |
| | type: text-generation |
| | name: Chinese Multiple-Choice Evaluation |
| | dataset: |
| | type: custom |
| | name: TMMLU+ |
| | metrics: |
| | - type: accuracy |
| | value: 24.96 |
| | name: TMMLU+ (0-shot) |
| | --- |
| | |
| | # TensorMind (0.5B) |
| |
|
| | TensorMind is a 536.9M-parameter causal language model for lightweight Chinese/English text generation. |
| |
|
| | ## Model Details |
| |
|
| | - Architecture: Decoder-only Transformer (`TensorMindForCausalLM`) |
| | - Layers: 32 |
| | - Hidden size: 1024 |
| | - Heads / KV heads: 16 / 8 (GQA) |
| | - Context length: 32,768 |
| | - Vocab size: 32,768 |
| | - Positional encoding: RoPE |
| | - Activation: SiLU |
| | - Parameters: 536,941,568 (~0.5B) |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | repo_id = "TensorMind/TensorMind" |
| | tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | repo_id, |
| | trust_remote_code=True, |
| | torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, |
| | ) |
| | |
| | prompt = "请用三句话介绍一下你自己。" |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## Benchmark Snapshot |
| |
|
| | Evaluation time: 2026-03-07 00:40 (UTC+8), zero-shot (`n-shot=0`). |
| |
|
| | | Model | Params | C-Eval | CMMLU | A-CLUE | TMMLU+ | AGIEval | |
| | |---|---:|---:|---:|---:|---:|---:| |
| | | TensorMind | 0.5B | 27.27 | 25.26 | 25.43 | 24.96 | 33.56 | |
| |
|
| |
|
| |  |
| |
|
| |
|
| |  |
| |
|
| | ## Intended Use |
| |
|
| | - Lightweight chat and text generation |
| | - Local experimentation and teaching |
| | - Baseline model for research and fine-tuning |
| |
|
| | ## Limitations |
| |
|
| | - This is a small model and can produce factual errors. |
| | - Benchmark numbers above are from multiple-choice style evaluations and do not fully represent open-ended generation quality. |
| | - Outputs may contain bias or unsafe content; apply filtering for production use. |
| |
|
| | ## License |
| |
|
| | MIT License. |
| |
|