Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language: [en]
|
| 4 |
+
library_name: safetensors
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
tags: [hobbylm, mixture-of-experts, moe, sparse-moe]
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# HobbyLM-Computer-Use (500M MoE, GUI agent / tool use)
|
| 10 |
+
|
| 11 |
+
Function-calling + accessibility-tree GUI-agent variant for computer-use tasks.
|
| 12 |
+
|
| 13 |
+
Part of the **HobbyLM** family — a from-scratch 500M sparse-MoE model trained on consumer-scale budgets.
|
| 14 |
+
|
| 15 |
+
## Architecture
|
| 16 |
+
|
| 17 |
+
HobbyLM is a **sparse Mixture-of-Experts (MoE)** transformer (DeepSeek-V3 / Ling-style):
|
| 18 |
+
|
| 19 |
+
| Component | Value |
|
| 20 |
+
|---|---|
|
| 21 |
+
| Total parameters | ~500M (≈ a fraction active per token) |
|
| 22 |
+
| Hidden size / layers | 768 / 16 (1 dense FFN layer, 15 MoE) |
|
| 23 |
+
| Routed experts / active | 36 / top-6 (+ 1 always-on shared expert) |
|
| 24 |
+
| Attention | GQA, 12 query / 3 KV heads, head-dim 128, per-head QK-norm |
|
| 25 |
+
| Router | sigmoid gating, aux-loss-free balancing bias, no top-k renorm |
|
| 26 |
+
| Positional | RoPE |
|
| 27 |
+
| Tokenizer | GPT-2 byte-level BPE (50,304 vocab, sentinel-padded) |
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## Files
|
| 31 |
+
|
| 32 |
+
- `model.safetensors` — the model weights (fp32).
|
| 33 |
+
- `config.json` — architecture / hyperparameters.
|
| 34 |
+
- GGUF builds (arch `hobbylm`) live in [`rootxhacker/HobbyLM-gguf`](https://huggingface.co/rootxhacker/HobbyLM-gguf).
|
| 35 |
+
|
| 36 |
+
## Loading (safetensors)
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
import json, torch
|
| 40 |
+
from safetensors.torch import load_file
|
| 41 |
+
sd = load_file("model.safetensors")
|
| 42 |
+
cfg = json.load(open("config.json"))
|
| 43 |
+
# rebuild the HobbyLM nn.Module from `cfg` and `load_state_dict(sd)`.
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
## Notes & limitations
|
| 47 |
+
|
| 48 |
+
- Research model at the ~500M scale: fluent but with the capability ceiling of a small model.
|
| 49 |
+
- The GGUF uses a custom `hobbylm` architecture (see the GGUF repo) and needs `moe-rs` or a patched llama.cpp.
|
| 50 |
+
|
| 51 |
+
## License
|
| 52 |
+
|
| 53 |
+
Apache-2.0.
|