TensorMind
/

TensorMind-0.5B

Text Generation

Eval Results (legacy)

Model card Files Files and versions

TensorMind-0.5B / README.md

TensorMind's picture

Add files using upload-large-folder tool

69262b5 verified 6 days ago

|

history blame contribute delete

2.98 kB

	---
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- tensormind
	- causal-lm
	- text-generation
	- chinese
	- custom-code
	language:
	- zh
	- en
	model-index:
	- name: TensorMind
	results:
	- task:
	type: text-generation
	name: Chinese Multiple-Choice Evaluation
	dataset:
	type: custom
	name: C-Eval
	metrics:
	- type: accuracy
	value: 27.27
	name: C-Eval (0-shot)
	- task:
	type: text-generation
	name: Chinese Multiple-Choice Evaluation
	dataset:
	type: custom
	name: CMMLU
	metrics:
	- type: accuracy
	value: 25.26
	name: CMMLU (0-shot)
	- task:
	type: text-generation
	name: Chinese Multiple-Choice Evaluation
	dataset:
	type: custom
	name: A-CLUE
	metrics:
	- type: accuracy
	value: 25.43
	name: A-CLUE (0-shot)
	- task:
	type: text-generation
	name: Chinese Multiple-Choice Evaluation
	dataset:
	type: custom
	name: TMMLU+
	metrics:
	- type: accuracy
	value: 24.96
	name: TMMLU+ (0-shot)
	---

	# TensorMind (0.5B)

	TensorMind is a 536.9M-parameter causal language model for lightweight Chinese/English text generation.

	## Model Details

	- Architecture: Decoder-only Transformer (`TensorMindForCausalLM`)
	- Layers: 32
	- Hidden size: 1024
	- Heads / KV heads: 16 / 8 (GQA)
	- Context length: 32,768
	- Vocab size: 32,768
	- Positional encoding: RoPE
	- Activation: SiLU
	- Parameters: 536,941,568 (~0.5B)

	## Quick Start

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	repo_id = "TensorMind/TensorMind"
	tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	trust_remote_code=True,
	torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
	)

	prompt = "请用三句话介绍一下你自己。"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Benchmark Snapshot

	Evaluation time: 2026-03-07 00:40 (UTC+8), zero-shot (`n-shot=0`).

	\| Model \| Params \| C-Eval \| CMMLU \| A-CLUE \| TMMLU+ \| AGIEval \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| TensorMind \| 0.5B \| 27.27 \| 25.26 \| 25.43 \| 24.96 \| 33.56 \|


	![TensorMind benchmark table](./assets/compare_table_tensormind.png)


	![TensorMind benchmark radar](./assets/compare_radar_tensormind.png)

	## Intended Use

	- Lightweight chat and text generation
	- Local experimentation and teaching
	- Baseline model for research and fine-tuning

	## Limitations

	- This is a small model and can produce factual errors.
	- Benchmark numbers above are from multiple-choice style evaluations and do not fully represent open-ended generation quality.
	- Outputs may contain bias or unsafe content; apply filtering for production use.

	## License

	MIT License.