AI Research Brief
Search
Methodology
中文
Agents Ignore Answers Placed in Plain Sight
22 selected from 178 papers
Featured
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
score 9
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 有代码实现; 关键词(1): agentic
Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity
score 9
机构: Cohere; 入选 HF Daily Papers; HF 热度: 5 upvotes (+2); 关键词(1): reasoning
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
score 6
入选 HF Daily Papers; 有代码实现; 关键词(3): fine-tuning, reasoning, vision-language
EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale
score 6
入选 HF Daily Papers; 有代码实现; 关键词(1): agentic
The Continuity Layer: Why Intelligence Needs an Architecture for What It Carries Forward
score 6
入选 HF Daily Papers; 有代码实现; 关键词(1): agentic
HSG: Hyperbolic Scene Graph
score 6
入选 HF Daily Papers; 有代码实现; 关键词(1): reasoning
Back to Repair: A Minimal Denoising Network\ for Time Series Anomaly Detection
score 5
入选 HF Daily Papers; 有代码实现
Also Worth Noting
LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
score 4
关键词(2): lightweight, reasoning; 顶会接收: CVPR
Are Emotion and Rhetoric Neurons in LLM? Neuron Recognition and Adaptive Masking for Emotion-Rhetoric Prediction Steering
score 4
关键词(1): reasoning; 顶会接收: ACL
Depth Adaptive Efficient Visual Autoregressive Modeling
score 4
关键词(1): pruning; 顶会接收: CVPR
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
score 7
机构: Peking University; 关键词(2): post-training, reasoning; 顶会接收: ACL
Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA
score 4
关键词(1): deployment; 顶会接收: ACL
AnchorMem: Anchored Facts with Associative Contexts for Building Memory in Large Language Models
score 4
机构: Tsinghua; 关键词(1): open-source
Speculative Decoding for Autoregressive Video Generation
score 4
机构: Tsinghua; 关键词(2): distillation, serving
PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation
score 4
关键词(3): instruction tuning, reasoning, vision-language; 顶会接收: CVPR
ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts
score 4
关键词(1): reasoning; 顶会接收: ACL
Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding
score 3
顶会接收: ACL
Cognitive Policy-Driven LLM for Diagnosis and Intervention of Cognitive Distortions in Emotional Support Conversation
score 3
顶会接收: ACL
Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation
score 3
顶会接收: ACL
From Admission to Invariants: Measuring Deviation in Delegated Agent Systems
score 3
机构: FAIR
Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells
score 3
顶会接收: ICLR
MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring
score 3
顶会接收: ACL