Instructions to use Brain2nd/NeuronSpark-0.9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Brain2nd/NeuronSpark-0.9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Brain2nd/NeuronSpark-0.9B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Brain2nd/NeuronSpark-0.9B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Brain2nd/NeuronSpark-0.9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Brain2nd/NeuronSpark-0.9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Brain2nd/NeuronSpark-0.9B

SGLang

How to use Brain2nd/NeuronSpark-0.9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Brain2nd/NeuronSpark-0.9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Brain2nd/NeuronSpark-0.9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Brain2nd/NeuronSpark-0.9B with Docker Model Runner:
```
docker model run hf.co/Brain2nd/NeuronSpark-0.9B
```

NeuronSpark-0.9B / README.md

Brain2nd

Fix: cross-reference links point to HuggingFace

c91be7a verified 2 months ago

preview code

raw

history blame contribute delete

4.52 kB

	---
	license: apache-2.0
	language:
	- zh
	library_name: transformers
	tags:
	- snn
	- spiking-neural-network
	- text-generation
	- neuromorphic
	pipeline_tag: text-generation
	---
	# NeuronSpark-0.9B

	## Introduction

	NeuronSpark-0.9B is a 0.87-billion parameter language model built entirely on Spiking Neural Networks (SNNs). Unlike conventional Transformer-based LLMs that rely on attention mechanisms, NeuronSpark replaces the entire computation backbone with biologically-inspired spiking neurons, achieving language modeling through membrane potential dynamics, surrogate gradient training, and adaptive computation (PonderNet).

	This is the pretrained base model (85,000 steps on a small subset of Seq-Monkey corpus).

	> Note on training data: Due to limited compute resources (single DGX Spark), this model was trained on only ~85K steps with a small fraction of the full Seq-Monkey 10B-token corpus. Despite the minimal training data, the model demonstrates emergent language capabilities — validating the architectural viability of pure SNN language models. We plan to continue scaling with more data and compute in future work.

	For the instruction-tuned chat version, see [NeuronSpark-0.9B-Chat](https://huggingface.co/Brain2nd/NeuronSpark-0.9B-Chat).

	## Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 874M \|
	\| Architecture \| SNN Hidden State Space Model \|
	\| Hidden Dimension (D) \| 896 \|
	\| Layers \| 20 \|
	\| SNN Timesteps (K) \| 16 (PonderNet adaptive) \|
	\| State Expansion (N) \| 8 \|
	\| FFN Dimension \| 2688 \|
	\| Vocabulary \| 6144 (custom BPE) \|
	\| Context Length \| 512 tokens \|
	\| Training Data \| Seq-Monkey (small subset, Chinese) \|
	\| Training Tokens \| ~1.4B (of ~10B available) \|
	\| Precision \| bfloat16 \|
	\| License \| Apache 2.0 \|

	## Architecture Highlights

	- Pure SNN: No attention, no standard MLP — all computation via PLIF (Parametric Leaky Integrate-and-Fire) neurons
	- Membrane Potential Leakage Activation: PLIFNode outputs `(1-β)·V_post` (leak current), naturally emphasizing fast-responding neurons over slow-memory neurons
	- Selective State Space: Hidden neurons with input-dependent dynamic β(t), α(t), V_th(t) — analogous to selective state space models (Mamba)
	- PonderNet Adaptive K: Each token dynamically decides how many SNN timesteps to use (1~K), with geometric distribution weighting
	- Triton Fused Kernels: Custom PLIF forward/backward kernels, single-pass sequential scan replacing 3-phase approach
	- Pre-LN Residual Stream: Continuous residual flow with RMSNorm, matching Qwen3/LLaMA architecture pattern

	## Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"Brain2nd/NeuronSpark-0.9B",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained("Brain2nd/NeuronSpark-0.9B")

	# Text completion
	text = f"{tokenizer.bos_token}人工智能的发展"
	input_ids = tokenizer(text, return_tensors="pt")["input_ids"]

	output_ids = model.generate(
	input_ids,
	max_new_tokens=128,
	temperature=0.8,
	top_k=50,
	eos_token_id=tokenizer.eos_token_id,
	)
	print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
	```

	Example Output:
	```
	人工智能的发展,为人类的未来发展提供了新的机遇。在未来,人工智能将是未来人工智能发展的重要方向。
	```

	## Requirements

	```bash
	pip install torch transformers spikingjelly safetensors
	# For Triton kernels (GPU): pip install triton
	```

	## Training

	Trained on a single NVIDIA DGX Spark (GB10, 128GB unified memory) with 4-GPU DDP.
	Due to compute constraints, training used only a small subset of the full corpus (~85K steps, ~1.4B tokens of ~10B available). Even with this limited data budget, the model acquires basic language generation ability, demonstrating the architectural viability of pure SNN language modeling.

	```bash
	torchrun --nproc_per_node=4 train_ddp.py \
	--D 896 --D_ff 2688 --K 16 --num_layers 20 \
	--batch_size 8 --accumulation_steps 8 \
	--learning_rate 2e-4 --warmup_iters 1000
	```

	## Citation

	```bibtex
	@misc{neuronspark2025,
	title={NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics},
	author={Zhengzheng Tang},
	year={2025},
	url={https://github.com/Brain2nd/NeuronSpark}
	}
	```

	## Contact

	- Author: Zhengzheng Tang
	- Email: zztangbu@bu.edu
	- GitHub: [Brain2nd/NeuronSpark](https://github.com/Brain2nd/NeuronSpark)