Instructions to use Brain2nd/NeuronSpark-0.9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Brain2nd/NeuronSpark-0.9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Brain2nd/NeuronSpark-0.9B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Brain2nd/NeuronSpark-0.9B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Brain2nd/NeuronSpark-0.9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Brain2nd/NeuronSpark-0.9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Brain2nd/NeuronSpark-0.9B

SGLang

How to use Brain2nd/NeuronSpark-0.9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Brain2nd/NeuronSpark-0.9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Brain2nd/NeuronSpark-0.9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Brain2nd/NeuronSpark-0.9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Brain2nd/NeuronSpark-0.9B with Docker Model Runner:
```
docker model run hf.co/Brain2nd/NeuronSpark-0.9B
```

NeuronSpark-0.9B / atomic_ops /fp16_codec.py

Brain2nd

Initial release: NeuronSpark-0.9B pretrained SNN language model

46977a8 verified 2 months ago

raw

history blame contribute delete

3.92 kB

	"""
	FP16 二进制编码/解码 — 模型边界操作（无可训练参数）。

	IEEE 754 float16 位布局（K=16 时间步）:
	时间步: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
	位: sign E4 E3 E2 E1 E0 M9 M8 M7 M6 M5 M4 M3 M2 M1 M0
	含义: 符号 ←── 指数(bias=15) ──→ ←────────── 尾数(隐含 1.xxx) ──────────→

	编码: embedding → IEEE 754 float16 位提取 → 16 帧二值 spike（detach，固定预处理）
	解码: 16 帧二值 spike → IEEE 754 位重建 → 连续值（可微分，梯度通过 surrogate grad 传播）
	"""

	import torch
	from torch import Tensor


	def fp16_encode(emb: Tensor, K: int = 16) -> Tensor:
	"""FP16 二进制编码（模型边界操作，固定预处理）。

	将连续 embedding 转为 IEEE 754 float16 位模式，作为 SNN 的 spike 输入。

	Args:
	emb: (batch, seq_len, D) 连续 embedding
	K: 时间步数（必须为 16，对应 float16 的 16 位）

	Returns:
	spike_seq: (seq_len*K, batch, D) 二值 {0, 1}, detached
	"""
	batch, seq_len, D = emb.shape

	# 转为 float16 获取 IEEE 754 位模式
	# clamp 防止 overflow 产生 Inf（float16 最大值 65504）
	emb_fp16 = emb.float().clamp(-65504.0, 65504.0).half()
	bits_int = emb_fp16.view(torch.int16) # (batch, seq_len, D)

	# 提取 16 位（MSB first: sign, exponent, mantissa）
	shifts = torch.arange(15, -1, -1, device=emb.device) # [15, 14, ..., 0]
	# bits_int: (batch, seq_len, D) → unsqueeze → (batch, seq_len, 1, D)
	# shifts: (K,) → view → (1, 1, K, 1)
	bits = ((bits_int.unsqueeze(2) >> shifts.view(1, 1, K, 1)) & 1) # (batch, seq_len, K, D)

	# 转为计算 dtype 并 detach（编码不参与梯度）
	bits = bits.to(emb.dtype).detach()

	# reshape → (seq_len*K, batch, D)
	return bits.reshape(batch, seq_len * K, D).permute(1, 0, 2).contiguous()


	def fp16_decode(spikes: Tensor, seq_len: int, K: int = 16) -> Tensor:
	"""FP16 精确位解码：从 16 个二值 spike 重建 float16 值。

	fp16_encode 的精确逆操作。全程可微分——梯度通过 IEEE 754 重建公式
	传到每个 spike 输出，再经 surrogate gradient 传入 SNN。

	IEEE 754 float16 重建:
	Normal (exp > 0): (-1)^sign * 2^(exp - 15) * (1 + mant_frac)
	Subnormal (exp = 0): (-1)^sign * 2^(-14) * mant_frac
	其中 mant_frac = Σ mant_bit_i * 2^{-(i+1)}, i=0..9

	Args:
	spikes: (seq_len*K, batch, D) 二值 {0, 1}（输出神经元的 spike）
	seq_len: token 序列长度
	K: 时间步数（= 16）

	Returns:
	decoded: (batch, seq_len, D) 连续值
	"""
	batch, D = spikes.shape[1], spikes.shape[2]

	# (seq_len*K, batch, D) → (batch, seq_len, K, D)
	s = spikes.permute(1, 0, 2).reshape(batch, seq_len, K, D)

	# ---- Sign: bit 0 ----
	sign = 1.0 - 2.0 * s[:, :, 0, :] # +1 or -1

	# ---- Exponent: bits 1-5, 加权求和 → 整数 0~31 ----
	exp_weights = torch.tensor(
	[16.0, 8.0, 4.0, 2.0, 1.0],
	device=spikes.device, dtype=spikes.dtype,
	)
	exp_val = (s[:, :, 1:6, :] * exp_weights.view(1, 1, 5, 1)).sum(dim=2)

	# ---- Mantissa fraction: bits 6-15, 加权求和 → [0, 1) ----
	mant_weights = torch.tensor(
	[2.0 ** (-i) for i in range(1, 11)],
	device=spikes.device, dtype=spikes.dtype,
	)
	mant_frac = (s[:, :, 6:, :] * mant_weights.view(1, 1, 10, 1)).sum(dim=2)

	# ---- IEEE 754 重建 ----
	# Normal: (-1)^s * 2^(exp-15) * (1 + mant_frac)
	# Subnormal: (-1)^s * 2^(-14) * mant_frac
	is_normal = (exp_val > 0)

	normal_val = sign * torch.exp2(exp_val - 15.0) * (1.0 + mant_frac)
	subnormal_val = sign * (2.0 ** -14) * mant_frac

	return torch.where(is_normal, normal_val, subnormal_val)