Post
1543
๐ก๏ธ Meet Spartacus-1B: Shattering the Memory Wall with True O(1) Inference! ๐
NoesisLab/Spartacus-1B-Instruct
NoesisLab/ChatSpartacus
At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression.
Say hello to Spartacus-1B-Instruct (1.3B) ๐ก๏ธ.
Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result?
โก True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000.
๐ง Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay.
๐ฅ Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan.
๐ Zero-Shot Benchmarks that Hit Hard:
O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B):
๐ ARC-Challenge: 0.3063 (vs Mamba 0.284)
๐ ARC-Easy: 0.5518
๐ PIQA: 0.6915
NoesisLab/Spartacus-1B-Instruct
NoesisLab/ChatSpartacus
At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression.
Say hello to Spartacus-1B-Instruct (1.3B) ๐ก๏ธ.
Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result?
โก True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000.
๐ง Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay.
๐ฅ Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan.
๐ Zero-Shot Benchmarks that Hit Hard:
O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B):
๐ ARC-Challenge: 0.3063 (vs Mamba 0.284)
๐ ARC-Easy: 0.5518
๐ PIQA: 0.6915