Natural Language Processing
On the Limits of LLM-as-Judge for Scientific Novelty Assessment
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards