FlameF0X (Daniel Fox)

reacted to FredyRivera-dev's post with 🚀 2 days ago

Post

6229

We wrote a full technical guide on how to train a bilingual (ES/EN) LLM from scratch: TinyQwen.

Covers:
- Hybrid architecture based on Qwen3.5
- Pre-training with 15B tokens
- Cost benchmark between H200 and B200
- Post-training with SFT + LoRA
- Full code and data, open source

With ~$11 of compute on an H200 we ran an initial training run, enough to validate the full architecture and pipeline.

Blog post: https://aquiles-ai.vercel.app/blog/tinyqwen-from-scratch

Technical feedback welcome, especially from anyone looking to replicate the pipeline with more compute.

3 replies

·

reacted to sergiopaniego's post with 🔥 6 days ago

Post

1480

you can train DiffusionGemma (a block-diffusion LLM) in TRL! and we're sharing an example for it

TRL trainers are made to be easily extended and adapted to different real-world use cases.

in this one, with a single method overridden in SFTTrainer (compute_loss), you can train this model

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_diffusion_gemma.py

replied to vineeth98's post 9 days ago

crazy how we are speedruning LM fine-tuning before gta6 is even out. 🤯

reacted to danielhanchen's post with 🚀 11 days ago

Post

5726

Gemma 4 is now faster and much more accurate! 🚀

Google made huge improvements to tool-calling and chat accuracy, reliability + speed.
To get fixes, re-download our updated GGUF, MLX, NVFP4 quants!

Unsloth quants: https://huggingface.co/collections/unsloth/gemma-4
Gemma 4 Guide: https://unsloth.ai/docs/models/gemma-4

8 replies

·

reacted to cetusian's post with 👀 25 days ago

Post

127

most labs won't bother teaching a small model to write and think in real romanian. no invented words, no english leaking in mid-sentence, diacritics intact.

we did.

surogate 3.5 is out. 2B and 4B, apache 2.0. invented word-forms cut from 4 per 1k down to ~1.5, and it reasons in the language you prompt it in, every time.

intelligence should speak your language too.

the models:

surogate/Surogate-3.5-4B

surogate/Surogate-3.5-2B

posted an update 25 days ago

Post

229

Hello, people of Hugging Face!

I recently released FlameF0X/TinyMoE-100m-2x8-retrained, a small Mixture of Experts language model trained on the Smollm-Corpus. Built on top of the Mixtral architecture, it’s fully compatible with 🤗 Transformers right out of the box!

The model can produce somewhat coherent text on its own, and for some reason, it generates even more coherent responses when given a ChatLM template.

I’m excited to see what you all come up with, and feel free to fine-tune it if you’d like. In the meantime, I’ll be working on developing the chat-trained version.

Demo: FlameF0X/TinyMoE-Playground
Collection: https://huggingface.co/collections/FlameF0X/tinymoe

reacted to ginigen-ai's post with ❤️ about 1 month ago

Post

10477

🧠 Does your LLM know when it's about to be wrong?

Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. 🎉

The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.

Two independent axes (never compared across a row): ① trap_rate — does it fall for tempting trap options? (lower = stronger) ② adapter gain Δ — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)

What's open: 📊 300+100 trap problems (each with a hidden trap + TICOS type) 🏆 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong))

Submit any HF model → auto-scored daily at 09:00 KST and added to the board.

🏆 Leaderboard → ginigen-ai/Metacognition-Leaderboard-Space

📊 Benchmark → ginigen-ai/Metacognition-Bench

🧩 Adapters → https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961

📊 Article → https://huggingface.co/blog/ginigen-ai/metacognition

Benchmark by ginigen-ai · Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).

11 replies

·

reacted to ginigen-ai's post with 🔥 about 1 month ago

Post

5207

🍳 The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

👉 ginigen-ai/robocasa-kitchen-leaderboard

replied to AxionLab-official's post about 1 month ago

i might a biased or not, but i think that the people behing "SupraLarps" might be racist. idk
surly is not the fact that they use the n word like 2 time

this is the profanity filter chrome extension btw.

reacted to AxionLab-official's post with ❤️ about 1 month ago

Post

5117

⚠️ Community Notice

We would like to clarify that SupraLabs has no affiliation, partnership, or connection whatsoever with "SupraLarps" or its members.

Please avoid interacting with their organization, repositories, or Spaces under the assumption that they are associated with us.

We are currently aware of the situation and have already contacted the appropriate channels to address it.

Thank you to everyone who continues to support SupraLabs. ❤️

12 replies

·

reacted to AxionLab-official's post with 🔥 about 1 month ago

Post

8568

Please, give a follow to SupraLabs!

We are researching the most, just to make the best medium models FOR YOU!

SupraLabs/Supra-A2A-Nano-Exp

SupraLabs/Supra-1.5-50M-Instruct-exp

SupraLabs/Supra-50M-Reasoning

SupraLabs/supra-title-50M-pre-gguf

Check more at Supralabs org!

SupraLabs

---
@LH-Tech-AI
@QyrouNnet-AI
@LyJonathan
@Mmorgan-ML
@User01110

2 replies

·

reacted to Hari5115's post with 🔥 about 1 month ago

Post

1567

Bit addictive. Fair warning !!!
Chain combos, fever mode, daily leaderboard. Free, runs in your browser.
Beat the score if you can 🫧

🎮 Hari5115/neon-pop

#SendHelp #JustOneMoreGame #NeonPop #NotAddicted

2 replies

·

replied to Hari5115's post about 1 month ago

New record 🔥🔥🔥

reacted to Reubencf's post with 🔥 about 1 month ago

Post

3766

Shadows of Tomorrow is finally live on Hugging Face Spaces with Gradio.

It’s a browser-playable RPG built with Godot, set in a post-nuclear future where players explore Magnus Province, collect medicinal plants, craft medicine, and help cure NPCs.

Play it here: Reubencf/Shadows_of_Tomorrow

11 replies

·

replied to their post about 2 months ago

Ah, MD doesn't work :(

posted an update about 2 months ago

Post

381

My models on the Intel Low-Bit LLM Leaderboard

Figured I'd share where my quantized models landed on Intel/low_bit_open_llm_leaderboard since I hadn't posted about it yet.

FlameF0X/Qwen3-4B-Distilled-Claude-4.6 (NVFP4 and MXFP4) sit at ranks 23 and 24 with 62.68% and 61.18% average, right below the base Qwen3-4B. Not bad considering they were distilled from Claude 4.6 rather than trained from scratch.

FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 and FlameF0X/LFM2.5-1.2B-Thinking-CodeX land around rank 47-49, competitive with MiniCPM5-1B and the Qwen3 sub-1B models despite being a larger base architecture.

The funny one is FlameF0X/Qwen2-0.2B-pt and FlameF0X/Qwen2-0.2B-it. They're not properly trained — genuinely undertrained, basically undefined — and they still beat openai/gpt-oss-20b at rank 66. The 20B model. Not sure what that says but it's something.

FlameF0X/LFM2-Research is at the bottom of my lineup but it's a research artifact, not meant to be competitive.

Chart below showing my models vs nearby competitors, with size vs performance on the left.

Chart made by Claude

1 reply

·

reacted to their post with 🔥 about 2 months ago

Post

7248

MiniMax-M3 coming soon.
https://github.com/MiniMax-AI/MiniMax-M3

posted an update about 2 months ago

Post

7248

MiniMax-M3 coming soon.
https://github.com/MiniMax-AI/MiniMax-M3

reacted to wenhuach's post with 🔥 2 months ago

Post

4589

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

9 replies

·

reacted to pankajpandey-dev's post with 🔥 2 months ago

Post

2705

🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

FlameF0X's activity