Zhijiang

Zeee

https://cartus.github.io/

AI & ML interests

Large Language Models

Recent Activity

upvoted a paper about 2 hours ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

upvoted a paper 6 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

upvoted a paper 6 days ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

View all activity

Organizations

upvoted a paper about 2 hours ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Paper • 2606.17682 • Published 3 days ago • 9

upvoted 2 papers 6 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper • 2606.11176 • Published 10 days ago • 113

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper • 2606.13106 • Published 8 days ago • 21

upvoted a paper 8 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 10 days ago • 16

upvoted a paper 9 days ago

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Paper • 2606.09380 • Published 10 days ago • 8

upvoted a paper 13 days ago

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Paper • 2606.06428 • Published 15 days ago • 25

upvoted a paper 17 days ago

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Paper • 2605.30501 • Published 22 days ago • 29

upvoted 3 papers 28 days ago

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Paper • 2605.14747 • Published May 14 • 146

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

Paper • 2605.19660 • Published about 1 month ago • 40

ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

Paper • 2605.20087 • Published about 1 month ago • 18

upvoted a collection 30 days ago

EnvFactory

Collection

This is the checkpoints and dataset for: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL. • 7 items • Updated 29 days ago • 1

upvoted a paper 30 days ago

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Paper • 2605.18703 • Published May 18 • 50

upvoted 3 papers 4 months ago

upvoted a collection 4 months ago

CodeScaler

Collection

5 items • Updated Mar 2 • 6

upvoted 4 papers 4 months ago

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Paper • 2602.01745 • Published Feb 2 • 7

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Paper • 2602.08321 • Published Feb 9 • 44

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Paper • 2602.07962 • Published Feb 8 • 24

Zhijiang

AI & ML interests

Recent Activity

Organizations

Zeee's activity