-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated a model about 2 hours ago
shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 published a model about 2 hours ago
shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-bs4 updated a model about 2 hours ago
shuoxing/llama3-8b-full-pretrain-wash-c4-0-3m-bs4