Building on HF

2 4 10

Naman Vats PRO

namanvats

AI & ML interests

Make Open Source AI win

Recent Activity

liked a model 9 days ago

openai/privacy-filter

liked a model 12 days ago

deepseek-ai/DeepSeek-V4-Flash

upvoted a collection about 1 month ago

AgentDoG

View all activity

Organizations

reacted to anakin87's post with ❤️ about 1 month ago

Post

10407

How LLM training with RL Environments works?

It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training

In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)

Consider a more complex tic-tac-toe env ❌⭕
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions

(envs can also include tools)

---

What happens at training?

We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env

No critic model needed, the group is the baseline
Simpler than PPO

1️⃣ Rollout generation: from the same board, model plays N games via sampling
2️⃣ Each game scored with deterministic rewards (win, format, ...)
3️⃣ Mean score computed across the group
4️⃣ Each rollout's advantage = its score minus the group mean
5️⃣ Model updated to favor trajectories above baseline

🔁 Repeat

For a deep dive, check out
🌱 https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs

2 replies

reacted to ajibawa-2023's post with 👍 about 1 month ago

Post

6114

Go-Code-Large
Dataset: ajibawa-2023/Go-Code-Large

Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.

By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend services—domains where Go is widely adopted.

Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.

2 replies

reacted to mrmanna's post with 👀 about 1 month ago

Post

4556

𝗔𝗜 & 𝗦𝗧𝗔𝗧𝗘 𝗠𝗔𝗖𝗛𝗜𝗡𝗘
𝘞𝘩𝘺 𝘗𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘉𝘦𝘨𝘪𝘯𝘴 𝘞𝘩𝘦𝘳𝘦 𝘛𝘰𝘺 𝘈𝘨𝘦𝘯𝘵𝘴 𝘌𝘯𝘥
Published: 18 Apr 2026 | Towards AI Publication | Medium
Open Link: https://medium.com/towards-artificial-intelligence/ai-state-machine-106387406c5a?sk=047b2f064c673a0095a9e8cc011b6a92

We talk a lot about governance, accuracy, and auditability in AI agents.
But I keep seeing a gap between the words and the engineering behind them.
Many agents have tools, orchestration, memory, graphs, and impressive demos. But when you ask how governance is actually enforced, the answer is often weak.
Prompt-level control is not production governance.
A production agent needs explicit state design: legal transitions, controlled progression, recovery paths, approval boundaries, and separation between memory, decision, policy, and execution.
This article explores the silent crisis unfolding in modern AI development: the urgent need to resurrect the disciplined architecture of state machines

1 reply

reacted to prithivMLmods's post with 👀 about 1 month ago

Post

4204

HY-World-2.0 — A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.

> HY-World-2.0-Demo: prithivMLmods/HY-World-2.0-Demo
> HY-World-2.0 [Server Mode]: prithivMLmods/HY-World-2.0-Demo
> Featuring 3D reconstruction and Gaussian splats with the Rerun viewer, along with camera poses, depth maps, and surface normals.
> In Server Mode, Gradio is served via FastAPI, with FastAPI remaining the top-level server.
> Model: tencent/HY-World-2.0
> GitHub: https://github.com/PRITHIVSAKTHIUR/HY-World-2.0-Demo

🤗To learn more, visit the app page or the respective model pages.

reacted to victor's post with 🔥 about 1 month ago

Post

6082

Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀

5 replies

reacted to anakin87's post with ❤️ about 2 months ago

Post

4180

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to Juanxi's post with 🔥 about 2 months ago

Post

4435

📢 Awesome Multimodal Modeling

We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.

🔹 Taxonomy & Evolution:

Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.

🔗 Explore & Contribute: https://github.com/OpenEnvision/Awesome-Multimodal-Modeling

3 replies

posted an update about 2 months ago

Post

3523

Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.

Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:

* Goose: 0.450 → 0.525
* OpenHands-SDK: 0.575 → 0.500

A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.

What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.

Same model, same task slice, different harness behavior under a tighter interaction budget.

Dataset:
namanvats/harbor-goose-openhands-benchmark

Code/configs:
https://github.com/namanvats/harbor-agent-ablation

reacted to do-me's post with 👍 about 2 years ago

Post

1170

Hey HuggingFace, love your open source attitude and particularly transformers.js for embedding models! Your current integration "use this model" gives you the transformers.js code, but there is no quick way to really test a model in one click.
SemanticFinder ( do-me/SemanticFinder) offers such an integration for all compatible feature-extraction models! All you need to do is add a URL parameter with the model ID to it, like so: https://do-me.github.io/SemanticFinder/?model=Xenova/bge-small-en-v1.5. You can also decide between quantized and normal mode with https://do-me.github.io/SemanticFinder/?model=Xenova/bge-small-en-v1.5&quantized=false. Maybe that would do for a HF integration?
I know it's a small open source project, but I really believe that it provides value for devs before deciding for one model or the other. Also, it's much easier than having to spin up a notebook, install dependencies etc.. It's private, so you could even do some real-world evaluation on personal data without having to worry about third-party services data policies.
Happy to hear the community's thoughts!

1 reply

replied to do-me's post about 2 years ago

Created API for model validation and quick integration of LLM Models in codebase: https://rapidapi.com/way2naman13/api/llm-api1/details

reacted to naskio's post with 🚀 about 2 years ago

Post

1465

🚀 Meet MergeUI - an All-in-one UI for Exploring Merged LLMs on Hugging Face 🤗!

Model merging is a cool new technique for creating powerful language models for cheap (no GPU required). But it raises questions like:
- Which models should we merge?
- What merge strategies work best?
- How do different base models affect performance?

With MergeUI, you can easily:
- Visualise the family tree and lineage of any merged model.
- Explore benchmark performance of family trees from the Open LLM Leaderboard.
- Analyse the different merge strategies used.
- Check license information for merged models and their ancestors.

All this helps you explore and understand merged models, uncover valuable insights, and make better decisions for your projects.

Ready to dive in? Check out these links:
- 🧬 Try MergeUI - https://naskio-mergeui.hf.space
- 👨‍💻 Source Code - https://github.com/naskio/mergeui

Love this project? boost it on GitHub and share it with your network.
#merge #mergekit #leaderboard
naskio/mergeui

reacted to fdaudens's post with 🔥 about 2 years ago

Post

1146

Switching from French to German to Chinese in the same discussion 😅

Impressive to see Cohere for AI's new Aya model multilingual capabilities.

- C4AI Aya 23 is a research open weights release
- 8 and 35 billion parameter models
- 23 languages supported

You can try it out here: https://huggingface.co/spaces/CohereForAI/aya-23

3 replies

reacted to DavidGF's post with 🔥 about 2 years ago

Post

1533

Introducing Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken-LoRA!

🔍 What’s the big deal?

✅ Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the base model (e.g., 8b if you use Meta-Llama3-8b-Instruct).
✅ VRAM Efficiency: Kraken-LoRA is highly VRAM efficient, maintaining the power of all experts without the bloat.
✅ Dynamic Adaptation: LoRA adapters are applied dynamically at runtime, following the routing process.
✅ High Efficiency: Enjoy increased efficiency without compromising performance, as long as the LoRA adapters match the base model.

💡 Conclusion: Kraken-LoRA empowers businesses to experience enhanced flexibility and performance from our architecture, enabling further scalability without sacrificing performance.

Check out the model here: VAGOsolutions/Kraken-LoRA
Explore the code here: https://github.com/cognitivecomputations/kraken/tree/main/Kraken-LoRA

Have fun with Kraken-LoRA! 🐙

reacted to DmitryRyumin's post with 🔥 about 2 years ago

Post

1624

🚀🎭🌟 New Research Alert - Gaussian Head & Shoulders (Avatars Collection)! 🌟🎭🚀
📄 Title: Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping 🔝

📝 Description: Gaussian Head & Shoulders is a method for creating high-fidelity upper body avatars by integrating 3D morphable head models with a neural texture warping approach to overcome the limitations of Gaussian splatting.

👥 Authors: Tianhao Wu et al.

📄 Paper: Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping (2405.12069)

🌐 Github Page: https://gaussian-head-shoulders.netlify.app

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #3DModeling #NeuralAvatars #GaussianSplatting #HighFidelityAvatars #3DReconstruction #AvatarRendering #TextureWarping #ComputerGraphics #DeepLearning #ComputerVision #Innovation

Naman Vats PRO

AI & ML interests

Recent Activity

Organizations

namanvats's activity