论文来源 | 8B模型科学推理反超235B

关键词(3): fine-tuning, GRPO, reasoning；顶会接收: ICML

关键词(1): reasoning；顶会接收: ACL

关键词(3): lightweight, distillation, real-time；顶会接收: ECCV

Provably Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning score 4

关键词(1): pretraining；顶会接收: ICML

Drowning in Routine: Signal Dilution in Multi-Turn Agent Training score 4

机构: Mila；关键词(2): scaling, GRPO

Multi4D: High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation score 4

关键词(1): real-time；顶会接收: ECCV

顶会接收: CVPR

Residue-Level Attributions in Protein Language Models Do Not Recover Allergen Epitopes score 3

机构: ETH Zurich

也值得关注