Instructions to use EphAsad/Midas-FableAgent-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use EphAsad/Midas-FableAgent-8B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Midas-FableAgent-8B", filename="Midas-FableAgent.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use EphAsad/Midas-FableAgent-8B with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M # Run inference directly in the terminal: llama cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M # Run inference directly in the terminal: llama cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Midas-FableAgent-8B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Midas-FableAgent-8B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Midas-FableAgent-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Midas-FableAgent-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Midas-FableAgent-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
- Ollama
How to use EphAsad/Midas-FableAgent-8B with Ollama:
ollama run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
- Unsloth Studio
How to use EphAsad/Midas-FableAgent-8B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Midas-FableAgent-8B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Midas-FableAgent-8B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Midas-FableAgent-8B to start chatting
- Pi
How to use EphAsad/Midas-FableAgent-8B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Midas-FableAgent-8B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Midas-FableAgent-8B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Midas-FableAgent-8B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use EphAsad/Midas-FableAgent-8B with Docker Model Runner:
docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
- Lemonade
How to use EphAsad/Midas-FableAgent-8B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Midas-FableAgent-8B:Q4_K_M
Run and chat with the model
lemonade run user.Midas-FableAgent-8B-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Midas-FableAgent
Plan. Act. Observe. Complete.
An agentic specialisation of Atem-8B — sequential fine-tuned for multi-step task execution, structured action emission, and observation-grounded iteration. Uses Fable agent traces.
Overview
Midas-FableAgent is a sequential fine-tune of Atem-8B toward agentic task execution. Where Atem-8B is a general-purpose reasoning model, Midas-FableAgent is trained to operate in execution loops: receiving a task, reasoning about the current state, emitting structured actions, observing results, and iterating until completion.
Training used two complementary data streams:
Stream A — 10,000 multi-turn agentic execution trajectories from OpenThoughts-Agent-v1-SFT. Each trajectory is a full ReAct-style loop: task → JSON action → environment observation → JSON action → ... →
task_complete: true. The model trains on every assistant turn in every trajectory, grounding its actions in real terminal output.Stream B — 4,665 planning and CoT reasoning examples from kelexine/fable-5-sft-traces. Single or multi-turn examples with full
<think>traces, covering high-level task decomposition before any execution loop begins.
Together these streams teach the model to plan before acting and execute through observation — the two capabilities that define reliable agentic behaviour.
Design note: This is v2 of Midas-FableAgent. The primary known limitation is that approximately 69% of training examples were removed post-formatting because response tokens fell outside the context window after truncation — effectively training on ~4,100 examples rather than the full 14,665. A v3 using trajectory splitting (each assistant turn as an independent training example) is planned and will substantially increase effective training data. The current model demonstrates correct agentic format and reasoning patterns; it is undertrained relative to what the data should deliver.
Atem Ecosystem
Midas-FableAgent is a task-specialised derivative of the Atem series, not a numbered Atem release.
| Model | Type | Capability |
|---|---|---|
| Atem-0.6B | Qwen3 SFT | Compact reasoning |
| Atem-1.7B | Qwen3 SFT | Efficient reasoning |
| Atem-4B | Qwen3 SFT | Balanced reasoning |
| Atem-8B | Qwen3 SFT | General-purpose reasoning |
| Atem-14B | Qwen3 SFT | High-capability reasoning |
| Midas-FableAgent | Atem-8B → Agentic SFT | Multi-step task execution |
Model Details
| Property | Value |
|---|---|
| Base model | EphAsad/Atem-8B |
| Training method | Sequential LoRA SFT — attention-only targets |
| LoRA config | r=32, alpha=64, dropout=0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj (no MLP) |
| Parameters | ~8.22B |
| Trainable parameters | 30,670,848 (0.37%) |
| Effective training examples | ~4,121 (post all-masked removal) |
| Training steps | 130 |
| Epochs | 2 |
| Final val loss | 0.4525 |
| Final train loss | 0.8590 |
| Learning rate | 4e-5 (cosine schedule) |
| Effective batch size | 64 (4 × 16 grad accum) |
| Hardware | NVIDIA A100-SXM4-80GB |
| Max sequence length | 12,288 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
Why attention-only LoRA: Midas-FableAgent is sequentially trained on top of Atem-8B, not a raw base. Skipping MLP projections and using a lower rank (r=32 vs Atem-8B's training rank) and lower LR (4e-5 vs 1e-4) are deliberate forgetting-prevention measures. The goal is to shift the model's output distribution toward agentic formats without eroding the general reasoning capability established during Atem-8B's training.
Output Format
Midas-FableAgent produces two output formats depending on the task type.
Agentic execution (Stream A format)
When operating as an execution agent — given a task and environment state — the model responds with a JSON action block, optionally preceded by a <think> reasoning trace:
<think>
[Reasoning about current state, what commands are needed, potential failure modes]
</think>
{
"analysis": "Current state assessment grounded in the provided terminal output.",
"plan": "Concrete sequence of steps to advance toward task completion.",
"commands": [
{"keystrokes": "find . -type f -size +100M\n", "duration": 0.5},
{"keystrokes": "sort -rh\n", "duration": 0.1}
],
"task_complete": false
}
On completion:
{
"analysis": "Task verified complete. All required outputs confirmed.",
"plan": "No further steps needed.",
"commands": [],
"task_complete": true
}
Planning / CoT (Stream B format)
When reasoning through open-ended planning problems without an execution context, the model produces a <think> trace followed by structured prose:
<think>
[Full reasoning trace — constraint identification, option analysis, decision rationale]
</think>
[Structured, actionable plan or analysis]
Training Data
| Dataset | Count | Format | Focus |
|---|---|---|---|
| open-thoughts/OpenThoughts-Agent-v1-SFT | 10,000 (streamed) | Multi-turn trajectories | Agentic execution loops |
| kelexine/fable-5-sft-traces | 4,665 (full) | Single/multi-turn CoT | Planning and reasoning |
Stream A processing: Conversations loaded from the conversations column. Role names normalised (human → user, gpt → assistant). Structural validation: must have at least one user and one assistant turn, must start with a user turn and end with an assistant turn. 100% yield — the OpenThoughts-Agent format is structurally clean.
Stream B processing: Loaded directly from parquet (the messages column serialises as a numpy array of per-turn JSON strings, bypassing schema parsing). Assistant response reconstructed from the context (user prompt), thinking (CoT trace → injected as <think>...</think>), and response (final answer) columns, rather than from the noisy messages column which contained /model slash-command noise and <local-command-stdout> artefacts. 100% yield after column-based reconstruction.
Loss curve (v2, MAX_SEQ_LENGTH=12288):
| Step | Train Loss | Val Loss |
|---|---|---|
| 50 | 0.8055 | 0.4942 |
| 100 | 0.7631 | 0.4558 |
| 130 (final) | 0.8196 | 0.4525 |
Validation loss descends monotonically throughout the run. Early stopping did not trigger — the model had not plateaued at the 2-epoch ceiling.
Evaluation
No standard benchmark evaluation (ARC, GSM8K, HellaSwag) was run for this release. Midas-FableAgent's capability is agentic rather than multiple-choice or mathematical, and lm-evaluation-harness metrics are not the appropriate measure. A qualitative evaluation was conducted using six agentic execution prompts (terminal tasks) and five planning prompts.
Observed strengths:
- Correctly produces the JSON action format (
analysis/plan/commands/task_complete) on all execution prompts analysisfields are grounded in the provided context rather than hallucinatedtask_complete: falseconsistently set on first-step responses where the task is not yet done- Observation-grounded reasoning: on service health check tasks, correctly reasoned to wait for command output before deciding next action
- Planning traces show genuine constraint identification — the database migration example correctly identified concurrent connection limits, DDL blocking risk, and transfer bandwidth as distinct constraints before structuring the plan
<think>tags present in all agentic outputs despite not being explicitly enforced on data
Known limitations:
- Empty or very short think blocks on simpler queries (model short-circuits reasoning on straightforward tasks)
Usage
Inference note
Qwen3's apply_chat_template with add_generation_prompt=True appends a <think> special token to prime the thinking mode. When decoding, use skip_special_tokens=False to preserve think tags in the output, then strip EOS/PAD tokens manually:
raw = tokenizer.decode(generated, skip_special_tokens=False)
raw = raw.replace(tokenizer.eos_token, '').strip()
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Midas-FableAgent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Agentic execution — use a task-specific system prompt
AGENT_SYSTEM = (
"You are an AI assistant tasked with solving command-line tasks in a "
"Linux environment. Format your response as JSON with the structure: "
"{\"analysis\": \"...\", \"plan\": \"...\", \"commands\": [{\"keystrokes\": \"...\", "
"\"duration\": 0.1}], \"task_complete\": false}"
)
messages = [
{"role": "system", "content": AGENT_SYSTEM},
{"role": "user", "content": "Find all files larger than 100MB under /home and list them sorted by size.\n\nCurrent terminal state:\nroot@host:/home#"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=900,
temperature=0.2,
do_sample=True,
repetition_penalty=1.1,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Midas-FableAgent",
max_seq_length=12288,
dtype=torch.bfloat16,
load_in_4bit=False,
)
FastLanguageModel.for_inference(model)
# Planning / CoT mode — uses Midas-FableAgent default identity
messages = [
{"role": "user", "content": "Plan a zero-downtime migration of a 200GB PostgreSQL database to AWS RDS."},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1400,
temperature=0.6,
do_sample=True,
repetition_penalty=1.1,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)
Ollama
# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Midas-FableAgent:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Midas-FableAgent:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Midas-FableAgent:Q8_0
llama.cpp
llama-server -hf EphAsad/Midas-FableAgent:Q4_K_M
Available Files
| File | Size | Description |
|---|---|---|
model-0000{1-4}-of-00004.safetensors |
~16.4 GB | Full bfloat16 weights (4 shards) |
Midas-FableAgent.Q4_K_M.gguf |
~5.0 GB | 4-bit — recommended |
Midas-FableAgent.Q5_K_M.gguf |
~5.9 GB | 5-bit |
Midas-FableAgent.Q8_0.gguf |
~8.7 GB | 8-bit — near-lossless |
System Prompt
Midas-FableAgent's identity is baked into the chat template and activates without an explicit system message. For agentic execution tasks, override the system prompt with a task-specific instruction that specifies the JSON output format (see usage examples above). To use the default identity directly:
You are Midas-FableAgent, an advanced agentic reasoning assistant built on
the Atem foundation. You excel at multi-step task execution — decomposing
complex goals into concrete actions, reasoning carefully about observations,
and iterating reliably toward task completion. You produce structured,
actionable outputs and maintain clear reasoning traces throughout execution.
Roadmap
| Version | Status | Change |
|---|---|---|
| v1 (MAX_SEQ=8192) | ✅ Released | Initial training run — 128 steps, ~4,084 effective examples |
| v2 (MAX_SEQ=12288) | ✅ This model | Increased context — 130 steps, ~4,121 effective examples |
| v3 (trajectory splitting) | 🔄 Planned | Each assistant turn as independent training example — eliminates all-masked removal, ~3× effective data |
Citation
@misc{midas_fableagent_2026,
author = {Asad, Zain},
title = {Midas-FableAgent: Sequential Agentic SFT on Atem-8B},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Midas-FableAgent}},
}
License
Released under the Apache 2.0 License, consistent with the base model chain (Midas-FableAgent → Atem-8B → Qwen3-8B).
Built independently by EphAsad
- Downloads last month
- -

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Midas-FableAgent-8B", filename="", )