论文来源 | Latent推理靠的不是推理

重点关注

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios score 13
机构: Baidu；入选 HF Daily Papers；HF 热度: 98 upvotes (+4)；有代码实现；关键词(1): tool use
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization score 11
入选 HF Daily Papers；HF 热度: 30 upvotes (+4)；关键词(1): GRPO；顶会接收: ICLR
EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents score 11
机构: Zhejiang University；入选 HF Daily Papers；HF 热度: 9 upvotes (+2)；有代码实现；关键词(2): fine-tune, embodied
The Trinity of Consistency as a Defining Principle for General World Models score 10
入选 HF Daily Papers；HF 热度: 187 upvotes (+4)；有代码实现；关键词(2): scaling, reasoning
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models score 10
入选 HF Daily Papers；HF 热度: 146 upvotes (+4)；有代码实现；关键词(1): reasoning
OmniGAIA: Towards Native Omni-Modal AI Agents score 10
入选 HF Daily Papers；HF 热度: 49 upvotes (+4)；有代码实现；关键词(3): reasoning, vision-language, open-source
Imagination Helps Visual Reasoning, But Not Yet in Latent Space score 10
入选 HF Daily Papers；HF 热度: 36 upvotes (+4)；有代码实现；关键词(1): reasoning
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning score 10
入选 HF Daily Papers；HF 热度: 24 upvotes (+4)；有代码实现；关键词(4): pruning, fine-tuning, retrieval-augmented, reasoning
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization score 9
入选 HF Daily Papers；HF 热度: 18 upvotes (+3)；有代码实现；关键词(5): scaling, latency, fine-tuning, agentic, reasoning
MediX-R1: Open Ended Medical Reinforcement Learning score 9
入选 HF Daily Papers；HF 热度: 16 upvotes (+3)；有代码实现；关键词(4): lightweight, reasoning, vision-language, open-source

也值得关注

S2O: Early Stopping for Sparse Attention via Online Permutation score 4
机构: Zhejiang University；关键词(2): lightweight, latency
Coded-E2LF: Coded Aperture Light Field Imaging from Events score 4
关键词(1): coding；顶会接收: CVPR
Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache score 4
关键词(1): deployment；顶会接收: CVPR
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings score 4
关键词(2): vision-language, text-to-image；顶会接收: ICLR
Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA score 4
机构: Cornell；关键词(2): lightweight, compression
HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models score 4
关键词(2): deployment, vision-language；顶会接收: CVPR
Towards Better RL Training Data Utilization via Second-Order Rollout score 4
机构: Peking University；关键词(1): reasoning
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks score 4
关键词(2): GRPO, agentic；顶会接收: ICLR
Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space score 4
关键词(1): synthetic data；顶会接收: AAAI
OpenFS: Multi-Hand-Capable Fingerspelling Recognition with Implicit Signing-Hand Detection and Frame-Wise Letter-Conditioned Synthesis score 4
关键词(1): open-source；顶会接收: CVPR
SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling score 4
关键词(1): vision-language；顶会接收: CVPR
TriLite: Efficient Weakly Supervised Object Localization with Universal Visual Features and Tri-Region Disentanglement score 4
关键词(2): fine-tuning, pre-training；顶会接收: CVPR
A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations score 4
机构: Google；关键词(1): MoE
Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset score 4
机构: Allen Institute；关键词(1): retrieval-augmented
Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning score 4
机构: AI2；关键词(4): scaling, reasoning, vision-language, data curation