Angelino Santiago
AI & ML interests
Recent Activity
Organizations
Currently have full running 13B (GLM 4.7 Flash) - which is very strong ; and experimental 21Bs of Qwen 3.5.
These are trained.These are in testing, and access is limited as of this writing.
As for MOEs:
This is a little more complicated as scripting must be written for Mergekit to "moe together" 0.8B, 2B, 4B, 9Bs etc etc.
A draft (by me) has been completed to do this; but not tested/debugged yet.No time line here ; too many variables.
RE 35B moes ; it is possible to address this in a different way ; but I have not tried it yet.
This is a different approach than REAP.
I believe I saw that 13B model repository earlier, but I cannot see it anymore. Was it an upscaled dense model of Qwen 3.5 9B with further training? That could be pretty interesting. Did you remove it or hide it? I was really looking forward to trying that model, or finetunes based on it. Hopefully, there is still a chance for that to reach public. 🙏
Good luck with these projects! 👍
Crash with llama.cpp
Hello @DavidAU ,
can you try to do the MoE versions, please? We could use something between 9B and 27B.
I was surprised about the capability of the small 9B model. While it wasn't capable of creating something as advanced as bigger models and the code it generated from scratch was usually broken, when I gave its own broken code, it was smart enough to fix individual issues. I do believe that it would have been capable to improve the quality iteratively when prompted to do so in small single tasks, just like with fixing the issues. Unfortunately, creating something little bit more ambitious from scratch with the 9B model was not possible in a single prompt.
On the other hand, the 27B model is already a bit too demanding for my hardware and token generation speed is too slow for the model to be useable for me.
The smallest MoE is 35B which is even bigger than the previous generation, but due to its Mixture of Experts architecture, it is still a bit faster than the 27B model for me and there are also some smaller REAP versions like 26B Creative and 24B regular version which are even faster.
I believe these MoE models would nicely fill up the gap between 9B and 27B dense variants in terms of quality, if you tuned them similarly with the Claude datasets.
All are bench marked against org model.
Many exceed all benchmarks of org model.
Claude, GLM, Gemini and other distills.
Thinking AND dedicated Instruct versions.
Core goal: Increase benchmarks, and address long thinking blocks.
Highlights:
9B and 27B instruct "Claude" versions hit 624 and 675 on the "ARC-C" (hard challenge).
Thinking fine tunes exceed org model performance (in thinking mode).
In many cases there is a drastic reduction in thinking block size.
9B Claude Heretic Uncensored, GGUF :
-Neo, Code Imatrix (duel imatrix)
- Updated Jinja template
- Custom tensor enhancements.
DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-MAX-NEOCODE-Imatrix-GGUF
COLLECTION [21 models]:
https://huggingface.co/collections/DavidAU/qwen-35-08-2-4-9-27-35b-regular-uncensored
UPDATE:
Now 31 models, including experimental 21B and new 13B models.