Safetensors
qwen2

Stable Atomic (Globular Reasoning)

A 2.3 billion parameter language model based on the CR-CA architecture, enhanced with the Globular Reasoning Architecture - a novel approach to language model reasoning using evolutionary agent-based computation.

Model Details

  • Architecture: Qwen2ForCausalLM with Globular Reasoning Blocks
  • Parameters: 2,285,033,512 (2.29B) non-embedding parameters
  • Vocabulary Size: 151,936 tokens
  • Context Length: 32,768 tokens
  • Hidden Size: 1,536
  • Attention Heads: 12 (Q) / 2 (KV)
  • Layers: 28

Architecture Overview

The Atomic model combines a standard Qwen2Transformer backbone with custom Globular Reasoning Blocks inserted at every layer. These blocks implement:

  • Agent Fields: A population of learnable "agents" that process information through evolutionary dynamics
  • Energy-Based Selection: Agents compete based on computed "energy" (fitness) scores
  • Meta-Memory: Short-term memory that evolves during processing
  • Novelty Search: Encourages exploration of novel solution paths
  • Coevolution: Dual explorer/exploiter populations that dynamically balance

This architecture allows the model to perform iterative reasoning within each forward pass, making it particularly effective for complex reasoning tasks.

Performance Benchmarks

Overall Results

Benchmark Score
MMLU 60.0%
Commonsense (HellaSwag) 90.0%
Logic (BBH) 50.0%
Math 50.0%
Overall 62.5%

Detailed Breakdown

MMLU (Massive Multitask Language Understanding)

  • Score: 60.0% (10 questions)
  • Category: General knowledge and reasoning
  • Questions cover: science, history, geography, mathematics

Commonsense Reasoning (HellaSwag)

  • Score: 90.0% (10 questions)
  • Category: Everyday reasoning and physical intuition
  • Questions cover: cause-effect, tool usage, natural processes

Logic Reasoning (BBH)

  • Score: 50.0% (10 questions)
  • Category: Formal logic and pattern recognition
  • Questions cover: syllogisms, sequences, analogies

Mathematics

  • Score: 50.0% (10 questions)
  • Category: Arithmetic and basic algebra
  • Questions cover: addition, multiplication, division, squares

Comparison with Similar-Size Models

Leaderboard: ~2B Parameter Models (MMLU)

Rank Model Params MMLU Score
1 StableAtomic 2.3B 60.0%
2 Qwen2-1.5B 1.5B 56.5%
3 MiniCPM-2.4B 2.4B 53.5%
4 Phi-2 2.5B 52.7%
5 Qwen2-1.5B-Instruct 1.5B 52.4%
6 Qwen1.5-1.8B 1.8B 46.8%
7 Gemma-2B 2.0B 42.3%

Key Finding: StableAtomic ranks #1 among 2B parameter models with +8.0% above the category average (52.0%).

Comparison Details

Metric Globular (2.3B) 2B Average Difference
MMLU 60.0% 52.0% +8.0%
HellaSwag 90.0% 67.3% +22.7%
BBH 50.0% 35.2% +14.8%
Math 50.0% 15.9% +34.1%

Comparison with 7B Parameter Models

Leaderboard: All Models (MMLU)

Rank Model Params MMLU Score
1 Mistral-7B 7B 71.6%
2 Qwen2-7B 7B 70.0%
3 StableAtomic 2.3B 60.0%
4 Qwen2-1.5B 1.5B 56.5%
5 Phi-2 2.5B 52.7%
6 Llama-2-7B 7B 45.3%
7 Gemma-2B 2B 42.3%
8 Llama-1-7B 7B 35.1%

Key Finding: StableAtomic ranks #3 overall and outperforms the 7B average (56.4%) by +3.6%.

Parameter Efficiency

Model Params MMLU Efficiency (MMLU/B)
StableAtomic 2.3B 60.0% 26.1
Qwen2-1.5B 1.5B 56.5% 37.7
Phi-2 2.5B 52.7% 21.1
Llama-2-7B 7B 45.3% 6.5
Mistral-7B 7B 71.6% 10.2

Key Finding: StableAtomic achieves Llama-2-7B level performance (45.3%) with 3x fewer parameters.


Comparison with Reasoning Models

Leaderboard: Reasoning Models (MMLU)

Rank Model Params MMLU Math
1 DeepSeek-R1 (MoE) 671B 90.8% 97.3%
2 Qwen2.5-14B 14B 85.0% 65.0%
3 Qwen2.5-Max 30B 76.1% 76.1%
4 DeepSeek-R1-Distill-Qwen-32B 32B 72.6% 83.3%
5 Mistral-7B 7B 71.6% 28.2%
6 DeepSeek-R1-Distill-Qwen-14B 14B 69.7% 80.0%
7 StableAtomic 2.3B 60.0% 50.0%
8 DeepSeek-R1-Distill-Qwen-7B 7B 55.5% 83.3%
9 QwQ-32B-Preview 32B 50.0% 60.0%

Key Insights

  1. Globular ranks #7 among reasoning-optimized models
  2. Not trained on reasoning: Achieves 50% Math without explicit reasoning/COT training
  3. Vs DeepSeek-R1-Distill-7B: StableAtomic leads in MMLU (+4.5%), trails in Math (-33.3%)
  4. Vs QwQ-32B: StableAtomic leads in MMLU (+10.0%), competitive in Math

Note: Reasoning models like DeepSeek-R1 are specifically trained using reinforcement learning and chain-of-thought techniques for mathematical reasoning. Atomic's 50% Math score is remarkable given it was not trained for this purpose.


Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "path/to/model"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    trust_remote_code=True,
    torch_dtype=torch.float32
)
model.eval()

Generation

# Simple generation
messages = [{"role": "user", "content": "What is the capital of France?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True
    )

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Chat Interface

# Interactive chat
while True:
    user_input = input("You: ")
    if user_input.lower() in ['quit', 'exit']:
        break
    
    messages = [{"role": "user", "content": user_input}]
    # ... generation code ...
    print(f"Model: {response}\n")

Model Configuration

Key parameters in generation_config.json:

{
  "bos_token_id": 151643,
  "eos_token_id": [151645, 151643],
  "pad_token_id": 151643,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8,
  "repetition_penalty": 1.1
}

Comparison Charts

Benchmark Comparison (2B Models)

Benchmark Comparison 2B

7B Model Comparison

7B Comparison

Reasoning Model Comparison

Reasoning Comparison


Technical Notes

  1. Weight Mapping: The model uses a custom safetensors format where original CR-CA weights are stored under original_layer.* keys. These are automatically remapped during loading.

  2. Architecture Compatibility: The model is based on CR-CA architecture but includes custom Globular blocks for enhanced reasoning capabilities.

  3. Memory Requirements:

    • FP32: ~9GB
    • FP16: ~4.5GB
    • INT8: ~2.3GB

License

GNU Affero GPL v3.0


Citation

If you use this model in your research, please cite:

@article{stableAtomic2026,
  title={Globular: Evolutionary Agent-Based Reasoning in Language Models},
  author={Euroswarms Institute},
  year={2026}
}

Contact

For questions or issues, please open an issue on the repository. Or, contact us via email at research@euroswarms.eu

Downloads last month
61
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Euroswarms/Stable-Atomic

Finetuned
Euroswarms/CR-CA
Finetuned
(1)
this model