rootxhacker commited on
Commit
d4da3aa
·
verified ·
1 Parent(s): ccdf026

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: [en]
4
+ library_name: safetensors
5
+ pipeline_tag: text-generation
6
+ tags: [hobbylm, mixture-of-experts, moe, sparse-moe]
7
+ ---
8
+
9
+ # HobbyLM-Computer-Use (500M MoE, GUI agent / tool use)
10
+
11
+ Function-calling + accessibility-tree GUI-agent variant for computer-use tasks.
12
+
13
+ Part of the **HobbyLM** family — a from-scratch 500M sparse-MoE model trained on consumer-scale budgets.
14
+
15
+ ## Architecture
16
+
17
+ HobbyLM is a **sparse Mixture-of-Experts (MoE)** transformer (DeepSeek-V3 / Ling-style):
18
+
19
+ | Component | Value |
20
+ |---|---|
21
+ | Total parameters | ~500M (≈ a fraction active per token) |
22
+ | Hidden size / layers | 768 / 16 (1 dense FFN layer, 15 MoE) |
23
+ | Routed experts / active | 36 / top-6 (+ 1 always-on shared expert) |
24
+ | Attention | GQA, 12 query / 3 KV heads, head-dim 128, per-head QK-norm |
25
+ | Router | sigmoid gating, aux-loss-free balancing bias, no top-k renorm |
26
+ | Positional | RoPE |
27
+ | Tokenizer | GPT-2 byte-level BPE (50,304 vocab, sentinel-padded) |
28
+
29
+
30
+ ## Files
31
+
32
+ - `model.safetensors` — the model weights (fp32).
33
+ - `config.json` — architecture / hyperparameters.
34
+ - GGUF builds (arch `hobbylm`) live in [`rootxhacker/HobbyLM-gguf`](https://huggingface.co/rootxhacker/HobbyLM-gguf).
35
+
36
+ ## Loading (safetensors)
37
+
38
+ ```python
39
+ import json, torch
40
+ from safetensors.torch import load_file
41
+ sd = load_file("model.safetensors")
42
+ cfg = json.load(open("config.json"))
43
+ # rebuild the HobbyLM nn.Module from `cfg` and `load_state_dict(sd)`.
44
+ ```
45
+
46
+ ## Notes & limitations
47
+
48
+ - Research model at the ~500M scale: fluent but with the capability ceiling of a small model.
49
+ - The GGUF uses a custom `hobbylm` architecture (see the GGUF repo) and needs `moe-rs` or a patched llama.cpp.
50
+
51
+ ## License
52
+
53
+ Apache-2.0.