Instructions to use EphAsad/Midas-FableAgent-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Midas-FableAgent-8B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Midas-FableAgent-8B",
	filename="Midas-FableAgent.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Midas-FableAgent-8B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Midas-FableAgent-8B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Midas-FableAgent-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Midas-FableAgent-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Midas-FableAgent-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M

Ollama
How to use EphAsad/Midas-FableAgent-8B with Ollama:
```
ollama run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Midas-FableAgent-8B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Midas-FableAgent-8B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Midas-FableAgent-8B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Midas-FableAgent-8B to start chatting

How to use EphAsad/Midas-FableAgent-8B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Midas-FableAgent-8B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Midas-FableAgent-8B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EphAsad/Midas-FableAgent-8B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Midas-FableAgent-8B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EphAsad/Midas-FableAgent-8B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Midas-FableAgent-8B:Q4_K_M
```

Lemonade

How to use EphAsad/Midas-FableAgent-8B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Midas-FableAgent-8B:Q4_K_M

Run and chat with the model

lemonade run user.Midas-FableAgent-8B-Q4_K_M

List all available models

lemonade list

Midas-FableAgent

Plan. Act. Observe. Complete.

An agentic specialisation of Atem-8B — sequential fine-tuned for multi-step task execution, structured action emission, and observation-grounded iteration. Uses Fable agent traces.

Overview

Midas-FableAgent is a sequential fine-tune of Atem-8B toward agentic task execution. Where Atem-8B is a general-purpose reasoning model, Midas-FableAgent is trained to operate in execution loops: receiving a task, reasoning about the current state, emitting structured actions, observing results, and iterating until completion.

Training used two complementary data streams:

Stream A — 10,000 multi-turn agentic execution trajectories from OpenThoughts-Agent-v1-SFT. Each trajectory is a full ReAct-style loop: task → JSON action → environment observation → JSON action → ... → task_complete: true. The model trains on every assistant turn in every trajectory, grounding its actions in real terminal output.
Stream B — 4,665 planning and CoT reasoning examples from kelexine/fable-5-sft-traces. Single or multi-turn examples with full <think> traces, covering high-level task decomposition before any execution loop begins.

Together these streams teach the model to plan before acting and execute through observation — the two capabilities that define reliable agentic behaviour.

Design note: This is v2 of Midas-FableAgent. The primary known limitation is that approximately 69% of training examples were removed post-formatting because response tokens fell outside the context window after truncation — effectively training on ~4,100 examples rather than the full 14,665. A v3 using trajectory splitting (each assistant turn as an independent training example) is planned and will substantially increase effective training data. The current model demonstrates correct agentic format and reasoning patterns; it is undertrained relative to what the data should deliver.

Atem Ecosystem

Midas-FableAgent is a task-specialised derivative of the Atem series, not a numbered Atem release.

Model	Type	Capability
Atem-0.6B	Qwen3 SFT	Compact reasoning
Atem-1.7B	Qwen3 SFT	Efficient reasoning
Atem-4B	Qwen3 SFT	Balanced reasoning
Atem-8B	Qwen3 SFT	General-purpose reasoning
Atem-14B	Qwen3 SFT	High-capability reasoning
Midas-FableAgent	Atem-8B → Agentic SFT	Multi-step task execution

Model Details

Property	Value
Base model	EphAsad/Atem-8B
Training method	Sequential LoRA SFT — attention-only targets
LoRA config	r=32, alpha=64, dropout=0.05
Target modules	q_proj, k_proj, v_proj, o_proj (no MLP)
Parameters	~8.22B
Trainable parameters	30,670,848 (0.37%)
Effective training examples	~4,121 (post all-masked removal)
Training steps	130
Epochs	2
Final val loss	0.4525
Final train loss	0.8590
Learning rate	4e-5 (cosine schedule)
Effective batch size	64 (4 × 16 grad accum)
Hardware	NVIDIA A100-SXM4-80GB
Max sequence length	12,288 tokens
Precision	bfloat16
License	Apache 2.0

Why attention-only LoRA: Midas-FableAgent is sequentially trained on top of Atem-8B, not a raw base. Skipping MLP projections and using a lower rank (r=32 vs Atem-8B's training rank) and lower LR (4e-5 vs 1e-4) are deliberate forgetting-prevention measures. The goal is to shift the model's output distribution toward agentic formats without eroding the general reasoning capability established during Atem-8B's training.

Output Format

Midas-FableAgent produces two output formats depending on the task type.

Agentic execution (Stream A format)

When operating as an execution agent — given a task and environment state — the model responds with a JSON action block, optionally preceded by a <think> reasoning trace:

<think>
[Reasoning about current state, what commands are needed, potential failure modes]
</think>
{
  "analysis": "Current state assessment grounded in the provided terminal output.",
  "plan": "Concrete sequence of steps to advance toward task completion.",
  "commands": [
    {"keystrokes": "find . -type f -size +100M\n", "duration": 0.5},
    {"keystrokes": "sort -rh\n", "duration": 0.1}
  ],
  "task_complete": false
}

On completion:

{
  "analysis": "Task verified complete. All required outputs confirmed.",
  "plan": "No further steps needed.",
  "commands": [],
  "task_complete": true
}

Planning / CoT (Stream B format)

When reasoning through open-ended planning problems without an execution context, the model produces a <think> trace followed by structured prose:

<think>
[Full reasoning trace — constraint identification, option analysis, decision rationale]
</think>
[Structured, actionable plan or analysis]

Training Data

Dataset	Count	Format	Focus
open-thoughts/OpenThoughts-Agent-v1-SFT	10,000 (streamed)	Multi-turn trajectories	Agentic execution loops
kelexine/fable-5-sft-traces	4,665 (full)	Single/multi-turn CoT	Planning and reasoning

Stream A processing: Conversations loaded from the conversations column. Role names normalised (human → user, gpt → assistant). Structural validation: must have at least one user and one assistant turn, must start with a user turn and end with an assistant turn. 100% yield — the OpenThoughts-Agent format is structurally clean.

Stream B processing: Loaded directly from parquet (the messages column serialises as a numpy array of per-turn JSON strings, bypassing schema parsing). Assistant response reconstructed from the context (user prompt), thinking (CoT trace → injected as <think>...</think>), and response (final answer) columns, rather than from the noisy messages column which contained /model slash-command noise and <local-command-stdout> artefacts. 100% yield after column-based reconstruction.

Loss curve (v2, MAX_SEQ_LENGTH=12288):

Step	Train Loss	Val Loss
50	0.8055	0.4942
100	0.7631	0.4558
130 (final)	0.8196	0.4525

Validation loss descends monotonically throughout the run. Early stopping did not trigger — the model had not plateaued at the 2-epoch ceiling.

Evaluation

No standard benchmark evaluation (ARC, GSM8K, HellaSwag) was run for this release. Midas-FableAgent's capability is agentic rather than multiple-choice or mathematical, and lm-evaluation-harness metrics are not the appropriate measure. A qualitative evaluation was conducted using six agentic execution prompts (terminal tasks) and five planning prompts.

Observed strengths:

Correctly produces the JSON action format (analysis / plan / commands / task_complete) on all execution prompts
analysis fields are grounded in the provided context rather than hallucinated
task_complete: false consistently set on first-step responses where the task is not yet done
Observation-grounded reasoning: on service health check tasks, correctly reasoned to wait for command output before deciding next action
Planning traces show genuine constraint identification — the database migration example correctly identified concurrent connection limits, DDL blocking risk, and transfer bandwidth as distinct constraints before structuring the plan
<think> tags present in all agentic outputs despite not being explicitly enforced on data

Known limitations:

Empty or very short think blocks on simpler queries (model short-circuits reasoning on straightforward tasks)

Usage

Inference note

Qwen3's apply_chat_template with add_generation_prompt=True appends a <think> special token to prime the thinking mode. When decoding, use skip_special_tokens=False to preserve think tags in the output, then strip EOS/PAD tokens manually:

raw = tokenizer.decode(generated, skip_special_tokens=False)
raw = raw.replace(tokenizer.eos_token, '').strip()

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Midas-FableAgent"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Agentic execution — use a task-specific system prompt
AGENT_SYSTEM = (
    "You are an AI assistant tasked with solving command-line tasks in a "
    "Linux environment. Format your response as JSON with the structure: "
    "{\"analysis\": \"...\", \"plan\": \"...\", \"commands\": [{\"keystrokes\": \"...\", "
    "\"duration\": 0.1}], \"task_complete\": false}"
)

messages = [
    {"role": "system", "content": AGENT_SYSTEM},
    {"role": "user", "content": "Find all files larger than 100MB under /home and list them sorted by size.\n\nCurrent terminal state:\nroot@host:/home#"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=900,
        temperature=0.2,
        do_sample=True,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Midas-FableAgent",
    max_seq_length=12288,
    dtype=torch.bfloat16,
    load_in_4bit=False,
)
FastLanguageModel.for_inference(model)

# Planning / CoT mode — uses Midas-FableAgent default identity
messages = [
    {"role": "user", "content": "Plan a zero-downtime migration of a 200GB PostgreSQL database to AWS RDS."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1400,
        temperature=0.6,
        do_sample=True,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=False
).replace(tokenizer.eos_token, '').strip()
print(response)

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Midas-FableAgent:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Midas-FableAgent:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Midas-FableAgent:Q8_0

llama.cpp

llama-server -hf EphAsad/Midas-FableAgent:Q4_K_M

Available Files

File	Size	Description
`model-0000{1-4}-of-00004.safetensors`	~16.4 GB	Full bfloat16 weights (4 shards)
`Midas-FableAgent.Q4_K_M.gguf`	~5.0 GB	4-bit — recommended
`Midas-FableAgent.Q5_K_M.gguf`	~5.9 GB	5-bit
`Midas-FableAgent.Q8_0.gguf`	~8.7 GB	8-bit — near-lossless

System Prompt

Midas-FableAgent's identity is baked into the chat template and activates without an explicit system message. For agentic execution tasks, override the system prompt with a task-specific instruction that specifies the JSON output format (see usage examples above). To use the default identity directly:

You are Midas-FableAgent, an advanced agentic reasoning assistant built on
the Atem foundation. You excel at multi-step task execution — decomposing
complex goals into concrete actions, reasoning carefully about observations,
and iterating reliably toward task completion. You produce structured,
actionable outputs and maintain clear reasoning traces throughout execution.

Roadmap

Version	Status	Change
v1 (MAX_SEQ=8192)	✅ Released	Initial training run — 128 steps, ~4,084 effective examples
v2 (MAX_SEQ=12288)	✅ This model	Increased context — 130 steps, ~4,121 effective examples
v3 (trajectory splitting)	🔄 Planned	Each assistant turn as independent training example — eliminates all-masked removal, ~3× effective data

Citation

@misc{midas_fableagent_2026,
  author       = {Asad, Zain},
  title        = {Midas-FableAgent: Sequential Agentic SFT on Atem-8B},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Midas-FableAgent}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Midas-FableAgent → Atem-8B → Qwen3-8B).

Built independently by EphAsad

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for EphAsad/Midas-FableAgent-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

EphAsad/Atem-8B

Adapter

(3)

this model

EphAsad
/

Midas-FableAgent-8B

Midas-FableAgent

Overview

Atem Ecosystem

Model Details

Output Format

Agentic execution (Stream A format)

Planning / CoT (Stream B format)

Training Data

Evaluation

Usage

Inference note

Transformers

Unsloth (faster inference)

Ollama

llama.cpp

Available Files

System Prompt

Roadmap

Citation

License

Model tree for EphAsad/Midas-FableAgent-8B

Datasets used to train EphAsad/Midas-FableAgent-8B