Ah, MD doesn't work :(
Daniel Fox PRO
AI & ML interests
Pre-training text generator.
(Brother, im 18)
Please don't try to contact me.
Recent Activity
updated a model about 2 hours ago
FlameF0X/TinyMoE-100m-A1K liked a model about 10 hours ago
moonshotai/Kimi-K2.7-Code published a model about 10 hours ago
FlameF0X/TinyMoE-100m-A1KOrganizations
replied to their post 3 days ago
posted an update 3 days ago
Post
122
My models on the Intel Low-Bit LLM Leaderboard
Figured I'd share where my quantized models landed on Intel/low_bit_open_llm_leaderboard since I hadn't posted about it yet.
FlameF0X/Qwen3-4B-Distilled-Claude-4.6 (NVFP4 and MXFP4) sit at ranks 23 and 24 with 62.68% and 61.18% average, right below the base Qwen3-4B. Not bad considering they were distilled from Claude 4.6 rather than trained from scratch.
FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 and FlameF0X/LFM2.5-1.2B-Thinking-CodeX land around rank 47-49, competitive with MiniCPM5-1B and the Qwen3 sub-1B models despite being a larger base architecture.
The funny one is FlameF0X/Qwen2-0.2B-pt and FlameF0X/Qwen2-0.2B-it. They're not properly trained — genuinely undertrained, basically undefined — and they still beat openai/gpt-oss-20b at rank 66. The 20B model. Not sure what that says but it's something.
FlameF0X/LFM2-Research is at the bottom of my lineup but it's a research artifact, not meant to be competitive.
Chart below showing my models vs nearby competitors, with size vs performance on the left.
Chart made by Claude
reacted to wenhuach's post with 🔥 16 days ago
Post
4536
🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting
⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Pure RTN mode powered by AutoRound⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
reacted to pankajpandey-dev's post with 🔥 18 days ago
Post
2694
🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA
reacted to Crownelius's post with 🔥🔥🔥 23 days ago
Post
4677
Howdy,
CompactAI-O is launching a tiny Model Golf, and the winner walks away with $50 in RunPod credits. Monthly. Every month. Show up, build, somebody wins.
What it is
Build the best language model you can under 100 million parameters, with at least a 1028-token context window. That's it. Any architecture, any tokenizer, any training scheme you can dream up at 3am. The only catch is it's gotta be open source (MIT, GPL, Apache, AGPL) take your pick.
It scratches the same itch as a Kaggle comp without the dataset\leaderboard nonsense. No fixed benchmark to game. No llama.cpp compatibility hoops. If you wanna train a 50M-param MoE with five experts and a tokenizer built on cookbooks, you can do that. Nothing stopping you.
The rules are listed in the discord and on the organization page if you're interested.
Why $50????
It's symbolic. It ain't gonna make anyone rich. But it's enough to cover a weekend of GPU time, enough to keep enthusiasts coming back, and not so much that it pulls in people who are just there for the money. Enthusiasts build interesting things. Interesting things move the field forward. A little incentive. I'd do it for $50 lol.
How to join
First round opens soon. Landing page is here:
→ https://huggingface.co/spaces/CompactAI-O/Tiny-model-golf
For questions or to swap ideas, the Discord's open:
→ https://discord.gg/y2jTct6Cxv
Excited to see what yall come up with. ♥
— Shane
CompactAI-O is launching a tiny Model Golf, and the winner walks away with $50 in RunPod credits. Monthly. Every month. Show up, build, somebody wins.
What it is
Build the best language model you can under 100 million parameters, with at least a 1028-token context window. That's it. Any architecture, any tokenizer, any training scheme you can dream up at 3am. The only catch is it's gotta be open source (MIT, GPL, Apache, AGPL) take your pick.
It scratches the same itch as a Kaggle comp without the dataset\leaderboard nonsense. No fixed benchmark to game. No llama.cpp compatibility hoops. If you wanna train a 50M-param MoE with five experts and a tokenizer built on cookbooks, you can do that. Nothing stopping you.
The rules are listed in the discord and on the organization page if you're interested.
Why $50????
It's symbolic. It ain't gonna make anyone rich. But it's enough to cover a weekend of GPU time, enough to keep enthusiasts coming back, and not so much that it pulls in people who are just there for the money. Enthusiasts build interesting things. Interesting things move the field forward. A little incentive. I'd do it for $50 lol.
How to join
First round opens soon. Landing page is here:
→ https://huggingface.co/spaces/CompactAI-O/Tiny-model-golf
For questions or to swap ideas, the Discord's open:
→ https://discord.gg/y2jTct6Cxv
Excited to see what yall come up with. ♥
— Shane
reacted to alvarobartt's post with 🚀 25 days ago
Post
3310
Latest
TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.
🧠
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving
Check the repository at https://github.com/alvarobartt/hf-mem
hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.
🧠
hf-mem now splits MoE memory into base model weights, routed experts, and KV cache🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving
Check the repository at https://github.com/alvarobartt/hf-mem
posted an update 28 days ago
Post
284
I did some testing on the scalability of FWKV. It hits a speed bottleneck at 1B due to the T4’s bandwidth limitations. Theoretically, it should match RWKV’s inference speed if the GPU had more bandwidth. So the 1B size is not accurate.
replied to their post 29 days ago
Not yet. I'm still experimenting.
Once I get something that I'm pleased with I'm going to write a blog.
posted an update 29 days ago
Post
277
Greetings Hugging Face!
I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and
So far I have:
- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.
The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.
I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and
floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.So far I have:
- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.
The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.
reacted to ArtelTaleb's post with 🔥 about 1 month ago
Post
2555
✈️ World Flight Arcade - Can you land in 60 seconds?
I just dropped a new browser game built entirely with Three.js: World Flight Arcade
The concept is brutally simple:
- 🕐 60 seconds of flight above a neon wireframe city
- ✈️ One single attempt to land on the runway
- 💀 No second chances. No respawn. Just you, the controls, and the clock.
The camera system is fully dynamic - it stays locked behind the plane within a ±45° pitch/yaw envelope, giving you that cinematic flight feel while keeping full spatial awareness.
Can you nail the landing on your first try?
👉 Play here: ArtelTaleb/world-flight-arcade
Built by Artel3D - handcrafted in Three.js, zero dependencies, runs directly in your browser.
Drop your score in the comments 👇
#gamedev #threejs #browserGame #webgl #artel3d #indiegame
reacted to HannesVonEssen's post with ❤️ about 1 month ago
Post
241
📣 I made a visualizer for Hugging Face models: https://hfviewer.com
✨ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!
🔗 The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B
Feel free to try it out and give me feedback on how it can be improved! ❤️
✨ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!
🔗 The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B
Feel free to try it out and give me feedback on how it can be improved! ❤️
reacted to Crownelius's post with 🔥 about 1 month ago
Post
3832
[DAY ONE] PROJECT CROWFEATHER 4/30/2026
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.
Crowfeather/Crowfeather-50m
54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.
Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.
What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.
What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.
The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k
Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](
Crowfeather ) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away.
Graphs will be available on my NEXT model lol
-Shane
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.
Crowfeather/Crowfeather-50m
54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.
Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.
What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.
What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.
The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k
Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](
Graphs will be available on my NEXT model lol
-Shane
reacted to anakin87's post with ❤️ about 2 months ago
Post
3354
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe
I took LiquidAI/LFM2-2.6B and trained it through play.
🧑🍳 Here's how:
1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies
Done! Beats GPT-5-mini 🏆
---
🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe
🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe
📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course
🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
I took LiquidAI/LFM2-2.6B and trained it through play.
🧑🍳 Here's how:
1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies
Done! Beats GPT-5-mini 🏆
---
🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe
🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe
📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course
🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
reacted to SeaWolf-AI's post with 🔥 3 months ago
Post
5087
ALL Bench — Global AI Model Unified Leaderboard
FINAL-Bench/all-bench-leaderboard
If you've ever tried to compare GPT-5.2 and Claude Opus 4.6 side by side, you've probably hit the same wall: the official Hugging Face leaderboard only tracks open-source models, so the most widely used AI systems simply aren't there. ALL Bench fixes that by bringing closed-source models, open-weight models, and — uniquely — all four teams under South Korea's national sovereign AI program into a single leaderboard. Thirty-one frontier models, one consistent scoring scale.
Scoring works differently here too. Most leaderboards skip benchmarks a model hasn't submitted, which lets models game their ranking by withholding results. ALL Bench treats every missing entry as zero and divides by ten, so there's no advantage in hiding your weak spots.
The ten core benchmarks span reasoning (GPQA Diamond, AIME 2025, HLE, ARC-AGI-2), coding (SWE-bench Verified, LiveCodeBench), and instruction-following (IFEval, BFCL). The standout is FINAL Bench — the world's only benchmark measuring whether a model can catch and correct its own mistakes. It reached rank five in global dataset popularity on Hugging Face in February 2026 and has been covered by Seoul Shinmun, Asia Economy, IT Chosun, and Behind.
Nine interactive charts let you explore everything from composite score rankings and a full heatmap to an open-vs-closed scatter plot. Operational metrics like context window, output speed, and pricing are included alongside benchmark scores.
All data is sourced from Artificial Analysis Intelligence Index v4.0, arXiv technical reports, Chatbot Arena ELO ratings, and the Korean Ministry of Science and ICT's official evaluation results. Updates monthly.
FINAL-Bench/all-bench-leaderboard
If you've ever tried to compare GPT-5.2 and Claude Opus 4.6 side by side, you've probably hit the same wall: the official Hugging Face leaderboard only tracks open-source models, so the most widely used AI systems simply aren't there. ALL Bench fixes that by bringing closed-source models, open-weight models, and — uniquely — all four teams under South Korea's national sovereign AI program into a single leaderboard. Thirty-one frontier models, one consistent scoring scale.
Scoring works differently here too. Most leaderboards skip benchmarks a model hasn't submitted, which lets models game their ranking by withholding results. ALL Bench treats every missing entry as zero and divides by ten, so there's no advantage in hiding your weak spots.
The ten core benchmarks span reasoning (GPQA Diamond, AIME 2025, HLE, ARC-AGI-2), coding (SWE-bench Verified, LiveCodeBench), and instruction-following (IFEval, BFCL). The standout is FINAL Bench — the world's only benchmark measuring whether a model can catch and correct its own mistakes. It reached rank five in global dataset popularity on Hugging Face in February 2026 and has been covered by Seoul Shinmun, Asia Economy, IT Chosun, and Behind.
Nine interactive charts let you explore everything from composite score rankings and a full heatmap to an open-vs-closed scatter plot. Operational metrics like context window, output speed, and pricing are included alongside benchmark scores.
All data is sourced from Artificial Analysis Intelligence Index v4.0, arXiv technical reports, Chatbot Arena ELO ratings, and the Korean Ministry of Science and ICT's official evaluation results. Updates monthly.
reacted to marksverdhei's post with 🤗 4 months ago
Post
2709
Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.
React to this post if you want to see this feature! 💡
React to this post if you want to see this feature! 💡