AI & ML interests

None defined yet.

Recent Activity

victor 
posted an update 4 days ago
view post
Post
250
Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill
IlyasMoutawwakil 
posted an update 6 days ago
view post
Post
2889
Transformers v5 just landed! 🚀
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? 🤔
The new dynamic weight loader + converter. Here’s why 👇

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5
·
IlyasMoutawwakil 
posted an update 11 days ago
view post
Post
2300
After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥

Why it had to be done 👇
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
⚙️ Compile end-to-end with torch.compile backends
📦 Export reliably via torch.export and torch.onnx.export
🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !
  • 1 reply
·
Nymbo 
posted an update 25 days ago
view post
Post
1959
Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return


In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.
Nymbo 
posted an update about 2 months ago
view post
Post
2341
🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets


I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot
victor 
posted an update about 2 months ago
KingNish 
posted an update about 2 months ago
view post
Post
2690
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
KingNish 
posted an update about 2 months ago
mrfakename 
posted an update about 2 months ago
view post
Post
12268
Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
Nymbo 
posted an update 2 months ago
view post
Post
5239
🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)


Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool


The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot
  • 1 reply
·
nroggendorff 
posted an update 2 months ago
view post
Post
2926
I am now being charged for paused and unstarted spaces out of the blue.
I think this is it, folks. o7


The unstarted spaces I can get behind. I would've appreciated a warning email first, but whatever. However, every time I restart the active usage goes up, despite all of my spaces being moved to CPU (free), and being paused.
·
nroggendorff 
posted an update 3 months ago
view post
Post
2382
Developing with ZeroGPU without a PRO account is painful. They give you so many requests at once, but then have like a 24 hour cooldown. I vote less requests in a batch, but then a shorter cooldown.


or just less of a cooldown, but i understand if that is not allowed
  • 3 replies
·
adamm-hf 
posted an update 3 months ago
adamm-hf 
posted an update 3 months ago