论文来源 | 答案摆面前agent也视而不见

重点关注

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents score 9
入选 HF Daily Papers；HF 热度: 15 upvotes (+3)；有代码实现；关键词(1): agentic
Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity score 9
机构: Cohere；入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(1): reasoning
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack score 6
入选 HF Daily Papers；有代码实现；关键词(3): fine-tuning, reasoning, vision-language
EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale score 6
入选 HF Daily Papers；有代码实现；关键词(1): agentic
The Continuity Layer: Why Intelligence Needs an Architecture for What It Carries Forward score 6
入选 HF Daily Papers；有代码实现；关键词(1): agentic
HSG: Hyperbolic Scene Graph score 6
入选 HF Daily Papers；有代码实现；关键词(1): reasoning
Back to Repair: A Minimal Denoising Network\ for Time Series Anomaly Detection score 5
入选 HF Daily Papers；有代码实现

也值得关注

LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation score 4
关键词(2): lightweight, reasoning；顶会接收: CVPR
Are Emotion and Rhetoric Neurons in LLM? Neuron Recognition and Adaptive Masking for Emotion-Rhetoric Prediction Steering score 4
关键词(1): reasoning；顶会接收: ACL
Depth Adaptive Efficient Visual Autoregressive Modeling score 4
关键词(1): pruning；顶会接收: CVPR
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions score 7
机构: Peking University；关键词(2): post-training, reasoning；顶会接收: ACL
Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA score 4
关键词(1): deployment；顶会接收: ACL
AnchorMem: Anchored Facts with Associative Contexts for Building Memory in Large Language Models score 4
机构: Tsinghua；关键词(1): open-source
Speculative Decoding for Autoregressive Video Generation score 4
机构: Tsinghua；关键词(2): distillation, serving
PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation score 4
关键词(3): instruction tuning, reasoning, vision-language；顶会接收: CVPR
ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts score 4
关键词(1): reasoning；顶会接收: ACL
Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding score 3
顶会接收: ACL
Cognitive Policy-Driven LLM for Diagnosis and Intervention of Cognitive Distortions in Emotional Support Conversation score 3
顶会接收: ACL
Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation score 3
顶会接收: ACL
From Admission to Invariants: Measuring Deviation in Delegated Agent Systems score 3
机构: FAIR
Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells score 3
顶会接收: ICLR
MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring score 3
顶会接收: ACL