Spaces:
Running on T4
Apply for community grant: Academic project (gpu)
Dear Hugging Face Team,
I am writing to apply for the GPU Community Grant for our academic project FloodDiffusion, a streaming motion generation demo for our CVPR 2026 paper.
Paper: FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation
Authors: Yiyi Cai, Yuhan Wu, Kunhang Li, You Zhou, Bo Zheng, Haiyang Liu
Affiliations: Shanda AI Research Tokyo, The University of Tokyo
Code: github.com/ShandaAI/FloodDiffusion
Model: ShandaAI/FloodDiffusionTiny
What this Space does
This is an interactive streaming demo that generates infinite-length 3D human motion in real-time from text descriptions. Users can type a motion prompt (e.g., "walk forward", "dance"), watch the 3D skeleton animate in real-time via Three.js, and change the text on-the-fly to seamlessly transition between motions. Multiple visitors can watch the same generation simultaneously (spectator mode).
This is the first streaming motion generation demo on Hugging Face Spaces.
Why we need a persistent GPU (not ZeroGPU)
FloodDiffusion's core feature is streaming generation β the model maintains persistent internal state (diffusion forcing buffer, VAE cache, accumulated motion trajectory) across an unbounded number of generation steps. This requires a GPU that stays allocated continuously, which is incompatible with ZeroGPU's per-request allocation model:
- Stateful generation β the model accumulates state across every call; deallocating the GPU would lose all accumulated state
- No fixed endpoint β generation is infinite-length; there is no natural point to release the GPU
- Real-time latency β cold-start would break the streaming experience
We are currently running on t4-small and would greatly appreciate a community GPU grant to keep the demo accessible for the research community.
Hardware request
- GPU: t4-small (16GB VRAM, sufficient for FloodDiffusionTiny)
- Sleep: Enabled (auto-sleep after 30 min of inactivity to save resources)
Thank you for considering our application!
Best regards,
Haiyang Liu
Side question: Any ideas for ZeroGPU compatibility?
We'd also love to hear if the HF team has any ideas or plans that could make streaming/stateful inference work with ZeroGPU in the future.
Our main blocker is that streaming generation requires persistent GPU state across requests β the model's internal buffer, VAE cache, and diffusion forcing state must survive between consecutive stream_generate_step() calls. Currently ZeroGPU deallocates the GPU after each @spaces.GPU function returns, which would destroy this state.
Some approaches we've considered but aren't sure about:
- Long-running
@spaces.GPUsession β is there a way to keep a ZeroGPU allocation alive for an extended interactive session (e.g., 2-5 minutes)? - State checkpointing to CPU β move all GPU tensors to CPU between calls, restore on next call. Feasible but adds significant latency per step.
- WebSocket + single long
@spaces.GPUcall β wrap the entire streaming session in one GPU call that communicates via WebSocket.
Would any of these work within ZeroGPU's current or planned architecture? Any other suggestions?
Thank you!
cc @hysts @merve @osanseviero β would appreciate your help with the GPU grant request above, and any thoughts on ZeroGPU compatibility for stateful streaming inference. Thank you!
(Sorry, correction β cc @hysts @merve @osanseviero for the GPU grant request and ZeroGPU question above. Thanks!)
Hi @H-Liu1997 , thanks for the detailed explanation. I've just assigned t4-small with a 30-minute sleep time.
Regarding the ZeroGPU compatibility questions, let me CC @cbensimon .
BTW, the next time you open a community grant request, please use the grant request flow described here instead of directly pinging people: https://huggingface.co/docs/hub/en/spaces-gpus#community-gpu-grants
That's the expected process. Also, Omar, whom you pinged, left HF years ago, so he isn't the right person to contact.