MOTHER_CORE_V2 / README.md
MediaStreamAI's picture
chunk 600 (W2.8 cutover BASE): upload README.md
4df46a4 verified
---
license: other
license_name: msai-sovereign
license_link: LICENSE
language:
- en
- cy
- gd
- ga
pipeline_tag: text-generation
tags:
- mother-core
- msai
- sovereign-ai
- united-kingdom
- causal-lm
library_name: transformers
---
# MOTHER CORE V2 — chunk 600 (W2.8 cutover base)
**Sovereign UK AI built from scratch by [MediaStream AI Limited (MSAI)](https://mediastreamai.com).**
This is **MOTHER CORE BASE** — the frozen foundation checkpoint at chunk 600 of the W2.7 → W2.8 training programme. All downstream MOTHER models (DEFENCE, ROBOTICS, LLM, CODE) build on this base.
- **Founder & CEO and Lead AI Architect:** Christopher Kenna
- **Parameters:** 6.88B (FP32 source, BF16 weights here)
- **Architecture:** 48 layers, dim 3072, 24 heads, 6 KV heads (GQA 4:1), RoPE θ=10000, RMS norm, tied embeddings
- **Context:** 4096 tokens
- **Training:** From-scratch sovereign UK build — no fine-tuning of external models
- **Source SHA256:** `0b1ef35ec60af4a7ad0648498de8526cb775a19501dda94dfbda1713e0475b60`
## Training journey
| Milestone | Eval (105-question harness) |
|---|---|
| Chunk 450 (initial W2.7 baseline) | 47/105 (45%) |
| Chunk 506 (post LR-fix rollback) | 44/105 (42%) |
| Chunk 550 (recovery, LR-capped) | 46/105 (44%) |
| **Chunk 600 (BASE freeze)** | **49/105 (47%)** |
## Scope
**MOTHER CORE handles:** math, science, reasoning, chain-of-thought, UK knowledge, MOTHER identity, tool calling (agents, RAG, memory, workflows), multilingual responses (English, Welsh, Irish, Scottish Gaelic), safety refusals.
**MOTHER CORE does NOT handle (separate sister models):**
- **MOTHER CODE** — software engineering, code generation
- **MOTHER LLM** — long-form creative writing, instruction-tuned content
- **MOTHER DEFENCE** — defence reasoning and strategy (W3 programme, builds on this BASE)
- **MOTHER ROBOTICS** — humanoid robot embodiment (W4 programme, builds on this BASE)
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tok = AutoTokenizer.from_pretrained("MediaStreamAI/MOTHER_CORE_V2")
model = AutoModelForCausalLM.from_pretrained(
"MediaStreamAI/MOTHER_CORE_V2",
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "Question:\n\nWhat is the capital of Wales?\n\nAnswer:"
inputs = tok(prompt, return_tensors="pt", add_special_tokens=True).to(model.device)
out = model.generate(
**inputs,
max_new_tokens=200,
do_sample=False,
repetition_penalty=1.3,
no_repeat_ngram_size=4,
pad_token_id=tok.pad_token_id,
)
print(tok.decode(out[0], skip_special_tokens=True))
```
**Critical inference rules:**
- Prompt wrap: `"Question:\n\n{q}\n\nAnswer:"` (exact whitespace)
- BOS token: 1 (required, `add_bos_token=True`)
- EOS token: 2
- PAD token: 0
- **Use greedy decoding only.** Sampling produces gibberish.
- Repetition penalty: 1.3, frequency-scaled
- No-repeat n-gram size: 4
## Programme context
- **W2.7 (complete)** — Core capability training: math, science, reasoning, identity, UK knowledge, multilingual, agent tool-calling, RAG, chat, memory, workflows
- **W2.8 (in progress)** — Document routing, argument validation, agent verifier loops, multi-step orchestration
- **W3** — MOTHER DEFENCE (defence reasoning and strategy)
- **W4** — MOTHER ROBOTICS (embodied awareness for humanoid platforms)
UK sovereign infrastructure: Manchester (HQ), Dundee (flagship DC), Durham. Phase 2 expansion H2 2026 to Düsseldorf, South Africa, Jamaica.
## License
MSAI Sovereign License. See LICENSE file. Built sovereign in the UK, not derived from any externally-licensed pre-trained model.
## Contact
MediaStream AI Limited
West Tower, 371 Deansgate, Manchester M15 4UR, United Kingdom
[mediastreamai.com](https://mediastreamai.com)