MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 28 days ago • 24
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25, 2025 • 20
deepseek-ai/DeepSeek-V3.1-Terminus Text Generation • 685B • Updated Sep 29, 2025 • 10.2k • • 365
Running Agents Featured 560 QwQ 32B Demo 🌖 560 Chat with an AI assistant for planning and writing help
deepseek-ai/DeepSeek-V3-0324 Text Generation • 685B • Updated Mar 27, 2025 • 633k • • 3.12k