wow, very cool!
Julien Chaumond PRO
AI & ML interests
<3 ML/AI for everyone, building products to propel communities fwd
Recent Activity
updated a dataset about 21 hours ago
julien-c/caliceoOrganizations
reacted to Reubencf's post with π₯ 1 day ago
Post
3477
Shadows of Tomorrow is finally live on Hugging Face Spaces with Gradio.
Itβs a browser-playable RPG built with Godot, set in a post-nuclear future where players explore Magnus Province, collect medicinal plants, craft medicine, and help cure NPCs.
Play it here: Reubencf/Shadows_of_Tomorrow
Itβs a browser-playable RPG built with Godot, set in a post-nuclear future where players explore Magnus Province, collect medicinal plants, craft medicine, and help cure NPCs.
Play it here: Reubencf/Shadows_of_Tomorrow
reacted to danielhanchen's post with ππ₯ 17 days ago
Post
9109
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4
reacted to abidlabs's post with π 3 months ago
Post
11467
Why I think local, open-source models will eventually win.
The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.
In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.
An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly βsmarterβ closed model that has to make remote API calls for every move.
Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users wonβt accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are βgood enoughβ and the expectation will shift toward everything running locally. Itβll happen sooner than most people think.
The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.
In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.
An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly βsmarterβ closed model that has to make remote API calls for every move.
Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users wonβt accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are βgood enoughβ and the expectation will shift toward everything running locally. Itβll happen sooner than most people think.
replied to nroggendorff's post 7 months ago
π
reacted to nroggendorff's post with ππ 7 months ago
reacted to erikkaum's post with π€ 11 months ago
Post
2199
We just released native support for @SGLang and @vllm-project in Inference Endpoints π₯
Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.
And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users π
Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.
And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users π
let's gooo!!
reacted to jsulz's post with π 12 months ago
Post
6872
It's been a bit since I took a step back and looked at
xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
π€ 700,000 users/orgs
π 350,000 repos
π 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from
commoncrawl was 471 TB.
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski π)
Let me know if there's anything you're interested in; happy to dig in!
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
π€ 700,000 users/orgs
π 350,000 repos
π 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski π)
Let me know if there's anything you're interested in; happy to dig in!
reacted to Narsil's post with ππ₯ about 1 year ago
Post
2523
Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.
Me: *Slacking off with colleagues*
Cursor: Ping.
Me: π€―
Cursor: Hold my beer.
Me: *Slacking off with colleagues*
Cursor: Ping.
Me: π€―
reacted to jsulz's post with π₯ about 1 year ago
Post
2701
Heyo @RichardErkhov the
xet-team at Hugging face was wondering if you wanted to join the fun and jump over to Xet storage. π€
We've been onboarding folks https://huggingface.co/blog/xet-on-the-hub know the backend can scale (Llama 4 and Qwen 3 are on Xet), is great for working with quants (see xet-team/quantization-dedup ), and we're pushing on inviting impactful orgs and users on the Hub. You fit the bill.
We'd love to onboard you, get some feedback, and create some excitement π
The steps are pretty straightforward - join the waitlist at hf.co/join/xet and we'll take care of the rest.
The system is fully backward compatible, so you shouldn't notice a thing. BUT to get the best experience when uploading/downloading, make sure you have
What do you think?
We've been onboarding folks https://huggingface.co/blog/xet-on-the-hub know the backend can scale (Llama 4 and Qwen 3 are on Xet), is great for working with quants (see xet-team/quantization-dedup ), and we're pushing on inviting impactful orgs and users on the Hub. You fit the bill.
We'd love to onboard you, get some feedback, and create some excitement π
The steps are pretty straightforward - join the waitlist at hf.co/join/xet and we'll take care of the rest.
The system is fully backward compatible, so you shouldn't notice a thing. BUT to get the best experience when uploading/downloading, make sure you have
hf_xet installed alongside the latest huggingface_hub What do you think?
reacted to reach-vb's post with π about 1 year ago
Post
4809
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! π₯
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage
p.p.s. this is fully backwards compatible so everything will work as it should! π€
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage
p.p.s. this is fully backwards compatible so everything will work as it should! π€
WOOHOO!!
reacted to cbensimon's post with π₯ about 1 year ago
Post
6195
π ZeroGPU
Nothing too fancy for nowβZeroGPU Spaces still default to
- π° size-based quotas / pricing (
- 𦣠the upcoming
You can as of now control GPU size via a Space variable. Accepted values:
-
-
-
The auto mode checks total CUDA tensor size during startup:
- More than 30GB β
- Otherwise β
medium size is now available as a power-user featureNothing too fancy for nowβZeroGPU Spaces still default to
large (70GB VRAM)βbut this paves the way for:- π° size-based quotas / pricing (
medium will offer significantly more usage than large)- 𦣠the upcoming
xlarge size (141GB VRAM)You can as of now control GPU size via a Space variable. Accepted values:
-
auto (future default)-
medium-
large (current default)The auto mode checks total CUDA tensor size during startup:
- More than 30GB β
large- Otherwise β
medium replied to their post about 1 year ago
did you get it to work since?
Post
4516
Important notice π¨
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference β with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference β with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.