ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

gsarti authored a paper about 17 hours ago

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

gsarti authored a paper about 17 hours ago

Distilling Formal Logic into Neural Spaces: A Kernel Alignment Approach for Signal Temporal Logic

gsarti authored a paper about 17 hours ago

Bridging Logic and Learning: Decoding Temporal Logic Embeddings via Transformers

View all activity

sergiopaniego

posted an update 3 days ago

Post

105

you can now train your own coding agents with trl + openenv, starting with opencode

we just added end-to-end support for training agent harnesses:

> TRL: a loop-owning training path (AsyncGRPOTrainer + HarnessRolloutWorker) that launches the agent in an OpenEnv session, reads back its trace, reconstructs the training samples, and trains with AsyncGRPO
> OpenEnv: the OpenCode harness environment plus a transparent proxy that forwards the agent's model calls and records each turn's token ids and logprobs

you train the actual opencode agent as is, it runs its own loop and tools and the policy learns from the exact tokens it produced

we're shipping a self-contained example: local subprocess sandbox, DeepCoder problems, validated on Qwen3-8B.

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/opencode.py
> docs: https://huggingface.co/docs/trl/main/openenv

and we're working actively on both sides so expect more 🤓

1 reply

sergiopaniego

posted an update 4 days ago

Post

1438

you can train DiffusionGemma (a block-diffusion LLM) in TRL! and we're sharing an example for it

TRL trainers are made to be easily extended and adapted to different real-world use cases.

in this one, with a single method overridden in SFTTrainer (compute_loss), you can train this model

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_diffusion_gemma.py

sergiopaniego

posted an update 5 days ago

Post

163

join us next Tuesday, July 28, for Class 3 of the Training Agents live series!

we'll dive into reinforcement learning for agent training, covering the intuition behind GRPO, how it works, and how to implement it in TRL with practical, e2e examples

see you there 🤠

live: https://www.youtube.com/live/ztdTed5egrM

> in case you missed class 1:
https://x.com/SergioPaniego/status/2069382207618379813
> and in case you missed class 2: https://x.com/SergioPaniego/status/2075180665184686187

LXT

authored a paper 7 days ago

UniVR: Thinking in Visual Space for Unified Visual Reasoning

Paper • 2607.12800 • Published 13 days ago • 32

CongWei1230

authored 2 papers 7 days ago

Function-Aware Fill-in-the-Middle as Mid-Training for Coding Agent Foundation Models

Paper • 2607.12463 • Published 13 days ago • 108

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Paper • 2607.05382 • Published 18 days ago • 87

LXT

authored a paper 7 days ago

SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents

Paper • 2603.08403 • Published May 21

CongWei1230

submitted a paper to Daily Papers 12 days ago

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Paper • 2607.05382 • Published 18 days ago • 87

sergiopaniego

posted an update 19 days ago

Post

7700

Frontier models use distillation as a step of their post-training pipelines.

In 2026 it has three jobs: compress a big model into a small one, merge RL experts into a single model, and let a model teach itself.

I wrote up which frontier models use each one and how: https://huggingface.co/blog/sergiopaniego/distillation-2026

It pairs with Class 2 of the Training an Agent series Ben and I are doing, where we teach these techniques hands-on with TRL!

3 replies

ShoufaChen

authored a paper 26 days ago

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

Paper • 2606.28480 • Published Jun 26 • 48

ymoslem

authored a paper 28 days ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

Paper • 2606.27457 • Published Jun 25 • 4

ymoslem

submitted a paper to Daily Papers 28 days ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

Paper • 2606.27457 • Published Jun 25 • 4

sergiopaniego

posted an update about 1 month ago

Post

358

TRL v1.7.0 is out‼️

+ continuous batching makes GRPO and RLOO 1.25x faster at -16 GB
+ proper MoE post-training across GRPO/RLOO/AsyncGRPO
+ new GMPO trainer
+ AsyncGRPO weight sync + padding-free
+ more

https://github.com/huggingface/trl/releases/tag/v1.7.0

wrote a small article about the continuous batching for GRPO feature

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo

sergiopaniego

posted an update about 1 month ago

Post

348

Continuous batching just landed in TRL for GRPO!

At 64 generations it runs faster and uses less VRAM than plain generate, no vLLM needed

How it works and when to reach for it, below

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo

sergiopaniego

posted an update about 1 month ago

Post

330

GLM-5.2 is open and comes with competitive performance against opus 4.8

day-0 in transformers + vllm + sglang, mit license 🤗

on the post-training side: critic-based ppo for variable-length agentic rollouts (ppo is back!) + an online anti-reward-hacking module that feeds the agent dummy info when it tries to cheat