论文来源 | Agent从80分涨到90分，失败模式没变

重点关注

Learning Personalized Agents from Human Feedback score 10
机构: Princeton；入选 HF Daily Papers；HF 热度: 6 upvotes (+2)；关键词(4): preference, agent, agents, embodied
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation score 8
入选 HF Daily Papers；HF 热度: 26 upvotes (+4)；关键词(1): robotics
Multi-agent cooperation through in-context co-player inference score 8
入选 HF Daily Papers；HF 热度: 15 upvotes (+3)；关键词(4): fast, inference, agent, agents
Towards a Science of AI Agent Reliability score 8
入选 HF Daily Papers；HF 热度: 12 upvotes (+3)；关键词(5): agent, agents, agentic, reasoning, safety
Reinforced Fast Weights with Next-Sequence Prediction score 8
入选 HF Daily Papers；HF 热度: 12 upvotes (+3)；关键词(5): fast, fine-tuning, GRPO, post-training, attention
MMA: Multimodal Memory Agent score 7
入选 HF Daily Papers；HF 热度: 6 upvotes (+2)；关键词(6): RAG, agent, agents, multimodal, benchmark
Learning Situated Awareness in the Real World score 7
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(5): agent, reasoning, multimodal, benchmark, evaluation
Discovering Multiagent Learning Algorithms with Large Language Models score 8
入选 HF Daily Papers；HF 热度: 10 upvotes (+3)；关键词(3): agent, coding, evaluation
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking score 6
入选 HF Daily Papers；HF 热度: 6 upvotes (+2)；关键词(1): alignment

也值得关注

Collaborative Zone-Adaptive Zero-Day Intrusion Detection for IoBT score 2
关键词(4): efficient, lightweight, deployment, latency
Axle Sensor Fusion for Online Continual Wheel Fault Detection in Wayside Railway Monitoring score 2
关键词(4): efficient, lightweight, cost, safety
GPSBench: Do Large Language Models Understand GPS Coordinates? score 2
关键词(4): finetuning, tool use, reasoning, robotics
Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis score 2
关键词(4): production, code generation, benchmark, open-source
Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes score 2
关键词(3): inference, MoE, reasoning
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis score 2
关键词(6): efficient, MoE, reasoning, vision-language, benchmark
Surrogate-Based Prevalence Measurement for Large-Scale A/B Testing score 2
关键词(3): fast, latency, evaluation
Evolutionary Context Search for Automated Skill Acquisition score 2
关键词(6): efficient, deployment, inference, fine-tuning, retrieval-augmented
Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System score 2
关键词(8): efficient, efficiency, quantization, serving, real-time
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis score 2
关键词(3): agent, agents, evaluation
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning score 6
入选 HF Daily Papers；HF 热度: 2 upvotes (+1)；关键词(2): reasoning, multimodal
CHAI: CacHe Attention Inference for text2video score 2
关键词(5): inference, latency, attention, diffusion, text-to-video
Retrieval Collapses When AI Pollutes the Web score 2
关键词(3): retrieval-augmented, RAG, search