Sources | An 8B Model Beats a 235B One at Science Reasoning

关键词(3): fine-tuning, GRPO, reasoning; 顶会接收: ICML

关键词(1): reasoning; 顶会接收: ACL

关键词(3): lightweight, distillation, real-time; 顶会接收: ECCV

Provably Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning score 4

关键词(1): pretraining; 顶会接收: ICML

Drowning in Routine: Signal Dilution in Multi-Turn Agent Training score 4

机构: Mila; 关键词(2): scaling, GRPO

Multi4D: High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation score 4

关键词(1): real-time; 顶会接收: ECCV

顶会接收: CVPR

Residue-Level Attributions in Protein Language Models Do Not Recover Allergen Epitopes score 3

机构: ETH Zurich

Also Worth Noting