RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published 6 days ago • 6
References Improve LLM Alignment in Non-Verifiable Domains Paper • 2602.16802 • Published 26 days ago • 2
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing Paper • 2512.11192 • Published Dec 12, 2025
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL Paper • 2601.09876 • Published Jan 14 • 7
BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages Paper • 2511.10338 • Published Nov 13, 2025
view post Post 448 PatchDNA, a DNA foundation model based on Meta's BLT tokenization strategy https://www.biorxiv.org/content/10.1101/2025.11.28.691095v1 See translation 🚀 1 1 + Reply
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 106
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 9
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
view post Post 466 Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence 🍅 🧬 monsoon-nlp/tomatotomato-gLM2-150M-v0.1 See translation 🚀 1 1 + Reply