Big Models Resist Rumors but Fall for Noise

10 selected from 64 papers

Featured

Exploration and Exploitation Errors Are Measurable for Language Model Agents score 10
入选 HF Daily Papers; HF 热度: 22 upvotes (+4); 有代码实现; 关键词(3): coding, reasoning, embodied

Also Worth Noting

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis score 4
入选 HF Daily Papers; 关键词(1): reasoning
4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview score 4
关键词(1): real-time; 顶会接收: CVPR
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size score 4
机构: Google; 关键词(1): scaling
MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models score 4
机构: Google; 关键词(2): pruning, post-training
WebXSkill: Skill Learning for Autonomous Web Agents score 4
机构: Microsoft Research; 关键词(1): deployment
Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation score 4
关键词(1): edge; 顶会接收: ACL
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs score 3
顶会接收: ICLR
Some Theoretical Limitations of t-SNE score 3
机构: MIT
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting score 3
顶会接收: ICLR