论文来源 | Web Agent在线RL突破70%，奖励模型换个方向思考就行

重点关注

Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings score 7
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(4): efficient, alignment, reasoning, multimodal

也值得关注

Differentiable Rule Induction from Raw Sequence Inputs score 3
顶会接收: ICLR
QuaRK: A Quantum Reservoir Kernel for Time Series Learning score 2
关键词(3): efficient, fast, lightweight
Fast Swap-Based Element Selection for Multiplication-Free Dimension Reduction score 2
关键词(5): efficient, fast, inference, search, cost
On Calibration of Large Language Models: From Response To Capability score 2
关键词(2): inference, evaluation
LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News score 2
关键词(8): real-time, agentic, reasoning, search, benchmark
AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks score 2
关键词(9): lightweight, deployment, inference, latency, fine-tuning
Out-of-Support Generalisation via Weight Space Sequence Modelling score 2
关键词(2): efficiency, safety
Small Reward Models via Backward Inference score 2
关键词(4): scaling, inference, GRPO, reasoning
Discrete-Space Generative AI Pipeline for Semantic Transmission of Signals score 2
关键词(3): efficiency, diffusion, audio
Scenario-Adaptive MU-MIMO OFDM Semantic Communication With Asymmetric Neural Network score 2
关键词(6): lightweight, latency, edge, attention, coding
OpAgent: Operator Agent for Web Navigation score 2
关键词(8): real-time, fine-tuning, fine-tune, agent, agents
Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning score 2
关键词(5): distillation, alignment, preference, reasoning, safety
DistillLens: Symmetric Knowledge Distillation Through Logit Lens score 2
关键词(2): distillation, alignment
Who Do LLMs Trust? Human Experts Matter More Than Other LLMs score 2
关键词(2): agents, reasoning
LLM-Confidence Reranker: A Training-Free Approach for Enhancing Retrieval-Augmented Generation Systems score 2
关键词(4): efficiency, transformer, retrieval-augmented, RAG