How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Midas-FableAgent-8B",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Midas-FableAgent

Plan. Act. Observe. Complete.

An agentic specialisation of Atem-8B — sequential fine-tuned for multi-step task execution, structured action emission, and observation-grounded iteration. Uses Fable agent traces.

Base Model Method Parameters License

Midas-FableAgent Logo


Overview

Midas-FableAgent is a sequential fine-tune of Atem-8B toward agentic task execution. Where Atem-8B is a general-purpose reasoning model, Midas-FableAgent is trained to operate in execution loops: receiving a task, reasoning about the current state, emitting structured actions, observing results, and iterating until completion.

Training used two complementary data streams:

  • Stream A — 10,000 multi-turn agentic execution trajectories from OpenThoughts-Agent-v1-SFT. Each trajectory is a full ReAct-style loop: task → JSON action → environment observation → JSON action → ... → task_complete: true. The model trains on every assistant turn in every trajectory, grounding its actions in real terminal output.

  • Stream B — 4,665 planning and CoT reasoning examples from kelexine/fable-5-sft-traces. Single or multi-turn examples with full <think> traces, covering high-level task decomposition before any execution loop begins.

Together these streams teach the model to plan before acting and execute through observation — the two capabilities that define reliable agentic behaviour.

Design note: This is v2 of Midas-FableAgent. The primary known limitation is that approximately 69% of training examples were removed post-formatting because response tokens fell outside the context window after truncation — effectively training on ~4,100 examples rather than the full 14,665. A v3 using trajectory splitting (each assistant turn as an independent training example) is planned and will substantially increase effective training data. The current model demonstrates correct agentic format and reasoning patterns; it is undertrained relative to what the data should deliver.


Atem Ecosystem

Midas-FableAgent is a task-specialised derivative of the Atem series, not a numbered Atem release.

Model Type Capability
Atem-0.6B Qwen3 SFT Compact reasoning
Atem-1.7B Qwen3 SFT Efficient reasoning
Atem-4B Qwen3 SFT Balanced reasoning
Atem-8B Qwen3 SFT General-purpose reasoning
Atem-14B Qwen3 SFT High-capability reasoning
Midas-FableAgent Atem-8B → Agentic SFT Multi-step task execution

Model Details

Property Value
Base model EphAsad/Atem-8B
Training method Sequential LoRA SFT — attention-only targets
LoRA config r=32, alpha=64, dropout=0.05
Target modules q_proj, k_proj, v_proj, o_proj (no MLP)
Parameters ~8.22B
Trainable parameters 30,670,848 (0.37%)
Effective training examples ~4,121 (post all-masked removal)
Training steps 130
Epochs 2
Final val loss 0.4525
Final train loss 0.8590
Learning rate 4e-5 (cosine schedule)
Effective batch size 64 (4 × 16 grad accum)
Hardware NVIDIA A100-SXM4-80GB
Max sequence length 12,288 tokens
Precision bfloat16
License Apache 2.0

Why attention-only LoRA: Midas-FableAgent is sequentially trained on top of Atem-8B, not a raw base. Skipping MLP projections and using a lower rank (r=32 vs Atem-8B's training rank) and lower LR (4e-5 vs 1e-4) are deliberate forgetting-prevention measures. The goal is to shift the model's output distribution toward agentic formats without eroding the general reasoning capability established during Atem-8B's training.


Output Format

Midas-FableAgent produces two output formats depending on the task type.

Agentic execution (Stream A format)

When operating as an execution agent — given a task and environment state — the model responds with a JSON action block, optionally preceded by a <think> reasoning trace:

<think>
[Reasoning about current state, what commands are needed, potential failure modes]
</think>
{
  "analysis": "Current state assessment grounded in the provided terminal output.",
  "plan": "Concrete sequence of steps to advance toward task completion.",
  "commands": [
    {"keystrokes": "find . -type f -size +100M\n", "duration": 0.5},
    {"keystrokes": "sort -rh\n", "duration": 0.1}
  ],
  "task_complete": false
}

On completion:

{
  "analysis": "Task verified complete. All required outputs confirmed.",
  "plan": "No further steps needed.",
  "commands": [],
  "task_complete": true
}

Planning / CoT (Stream B format)

When reasoning through open-ended planning problems without an execution context, the model produces a <think> trace followed by structured prose:

<think>
[Full reasoning trace — constraint identification, option analysis, decision rationale]
</think>
[Structured, actionable plan or analysis]

Training Data

Dataset Count Format Focus
open-thoughts/OpenThoughts-Agent-v1-SFT 10,000 (streamed) Multi-turn trajectories Agentic execution loops
kelexine/fable-5-sft-traces 4,665 (full) Single/multi-turn CoT Planning and reasoning

Stream A processing: Conversations loaded from the conversations column. Role names normalised (humanuser, gptassistant). Structural validation: must have at least one user and one assistant turn, must start with a user turn and end with an assistant turn. 100% yield — the OpenThoughts-Agent format is structurally clean.

Stream B processing: Loaded directly from parquet (the messages column serialises as a numpy array of per-turn JSON strings, bypassing schema parsing). Assistant response reconstructed from the context (user prompt), thinking (CoT trace → injected as <think>...</think>), and response (final answer) columns, rather than from the noisy messages column which contained /model slash-command noise and <local-command-stdout> artefacts. 100% yield after column-based reconstruction.

Loss curve (v2, MAX_SEQ_LENGTH=12288):

Step Train Loss Val Loss
50 0.8055 0.4942
100 0.7631 0.4558
130 (final) 0.8196 0.4525

Validation loss descends monotonically throughout the run. Early stopping did not trigger — the model had not plateaued at the 2-epoch ceiling.


Evaluation

No standard benchmark evaluation (ARC, GSM8K, HellaSwag) was run for this release. Midas-FableAgent's capability is agentic rather than multiple-choice or mathematical, and lm-evaluation-harness metrics are not the appropriate measure. A qualitative evaluation was conducted using six agentic execution prompts (terminal tasks) and five planning prompts.

Observed strengths:

  • Correctly produces the JSON action format (analysis / plan / commands / task_complete) on all execution prompts
  • analysis fields are grounded in the provided context rather than hallucinated
  • task_complete: false consistently set on first-step responses where the task is not yet done
  • Observation-grounded reasoning: on service health check tasks, correctly reasoned to wait for command output before deciding next action
  • Planning traces show genuine constraint identification — the database migration example correctly identified concurrent connection limits, DDL blocking risk, and transfer bandwidth as distinct constraints before structuring the plan
  • <think> tags present in all agentic outputs despite not being explicitly enforced on data

Known limitations:

  • Empty or very short think blocks on simpler queries (model short-circuits reasoning on straightforward tasks)

Usage

Inference note

Qwen3's apply_chat_template with add_generation_prompt=True appends a <think> special token to prime the thinking mode. When decoding, use skip_special_tokens=False to preserve think tags in the output, then strip EOS/PAD tokens manually:

raw = tokenizer.decode(generated, skip_special_tokens=False)
raw = raw.replace(tokenizer.eos_token, '').strip()

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Midas-FableAgent"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Agentic execution — use a task-specific system prompt
AGENT_SYSTEM = (
    "You are an AI assistant tasked with solving command-line tasks in a "
    "Linux environment. Format your response as JSON with the structure: "
    "{\"analysis\": \"...\", \"plan\": \"...\", \"commands\": [{\"keystrokes\": \"...\", "
    "\"duration\": 0.1}], \"task_complete\": false}"
)

messages = [
    {"role": "system", "content": AGENT_SYSTEM},
    {"role": "user", "content": "Find all files larger than 100MB under /home and list them sorted by size.\n\nCurrent terminal state:\nroot@host:/home#"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=900,
        temperature=0.2,
        do_sample=True,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Midas-FableAgent",
    max_seq_length=12288,
    dtype=torch.bfloat16,
    load_in_4bit=False,
)
FastLanguageModel.for_inference(model)

# Planning / CoT mode — uses Midas-FableAgent default identity
messages = [
    {"role": "user", "content": "Plan a zero-downtime migration of a 200GB PostgreSQL database to AWS RDS."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1400,
        temperature=0.6,
        do_sample=True,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Midas-FableAgent:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Midas-FableAgent:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Midas-FableAgent:Q8_0

llama.cpp

llama-server -hf EphAsad/Midas-FableAgent:Q4_K_M

Available Files

File Size Description
model-0000{1-4}-of-00004.safetensors ~16.4 GB Full bfloat16 weights (4 shards)
Midas-FableAgent.Q4_K_M.gguf ~5.0 GB 4-bit — recommended
Midas-FableAgent.Q5_K_M.gguf ~5.9 GB 5-bit
Midas-FableAgent.Q8_0.gguf ~8.7 GB 8-bit — near-lossless

System Prompt

Midas-FableAgent's identity is baked into the chat template and activates without an explicit system message. For agentic execution tasks, override the system prompt with a task-specific instruction that specifies the JSON output format (see usage examples above). To use the default identity directly:

You are Midas-FableAgent, an advanced agentic reasoning assistant built on
the Atem foundation. You excel at multi-step task execution — decomposing
complex goals into concrete actions, reasoning carefully about observations,
and iterating reliably toward task completion. You produce structured,
actionable outputs and maintain clear reasoning traces throughout execution.

Roadmap

Version Status Change
v1 (MAX_SEQ=8192) ✅ Released Initial training run — 128 steps, ~4,084 effective examples
v2 (MAX_SEQ=12288) This model Increased context — 130 steps, ~4,121 effective examples
v3 (trajectory splitting) 🔄 Planned Each assistant turn as independent training example — eliminates all-masked removal, ~3× effective data

Citation

@misc{midas_fableagent_2026,
  author       = {Asad, Zain},
  title        = {Midas-FableAgent: Sequential Agentic SFT on Atem-8B},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Midas-FableAgent}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Midas-FableAgent → Atem-8B → Qwen3-8B).


Built independently by EphAsad

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EphAsad/Midas-FableAgent-8B

Finetuned
Qwen/Qwen3-8B
Adapter
(3)
this model

Datasets used to train EphAsad/Midas-FableAgent-8B