Investigate LLM reasoning robustness through controlled benchmark.
-
YMinglai/clean_finegrain_checkpoint-1950
Text Generation • 1B • Updated -
YMinglai/clean_finegrain_checkpoint-5850
Text Generation • 1B • Updated -
YMinglai/clean_finegrain_checkpoint-9750
Text Generation • 1B • Updated -
YMinglai/clean_finegrain_checkpoint-11700
Text Generation • 1B • Updated