Alex Hant's picture

7

Alex Hant

hardhant

·

hardhant@gmail.com

AI & ML interests

None yet

Recent Activity

reacted to danielhanchen's post with 🔥 1 day ago

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw. Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp Guide: https://unsloth.ai/docs/basics/api

reacted to DedeProGames's post with 🔥 1 day ago

GRaPE 2 Pro is now available. https://huggingface.co/SL-AI/GRaPE-2-Pro This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class. GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the `<thinking_mode>` tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with `minimal`, all the way up to `xtra-Hi` for deep, extended thought on hard problems. For most agentic use, `auto` or `low` is the move to keep things snappy. It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build. If you want to try it out and give feedback, that would be really appreciated. Email us at `contact@skinnertopia.com`

reacted to ManniX-ITA's post with 🚀 5 days ago

🚀 Two releases this week pushing merge methodology forward. ▶ Qwen3.6-27B-Omnimerge-v4-MLP https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election). Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%. Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5): • HumanEval: 84.76% (= base, +5.49 pp vs v2) • MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2) • GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2) ▶ Qwen3.5-4B Importance-Signal Study (M1..M5) Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies. Q6_K HE / MBPP pass@1: • M1 Vanilla DARE-TIES → 51.22 / 47.00 • M2 OMv2 (no signal) → 52.44 / 49.40 • M3 OMv2 + Fisher → 57.93 🥇 / 48.80 • M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40 • M5 OMv2 + LRP → 53.05 / 51.40 🥇 Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed 5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt. All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet