论文来源 | SFT越强，RL反而越弱？

重点关注

Rethinking Selective Knowledge Distillation score 9
入选 HF Daily Papers；HF 热度: 22 upvotes (+4)；关键词(2): efficiency, distillation
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning score 9
入选 HF Daily Papers；HF 热度: 39 upvotes (+4)；关键词(3): post-training, reasoning, evaluation
Sparse Reward Subsystem in Large Language Models score 6
入选 HF Daily Papers；HF 热度: 8 upvotes (+2)；关键词(1): reasoning
Ebisu: Benchmarking Large Language Models in Japanese Finance score 8
入选 HF Daily Papers；HF 热度: 17 upvotes (+3)；关键词(3): benchmark, evaluation, open-source
Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning score 8
入选 HF Daily Papers；HF 热度: 15 upvotes (+3)；关键词(5): alignment, agent, agents, agentic, reasoning
Balancing Understanding and Generation in Discrete Diffusion Models score 8
入选 HF Daily Papers；HF 热度: 13 upvotes (+3)；关键词(2): scaling, diffusion
LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents score 7
入选 HF Daily Papers；HF 热度: 8 upvotes (+2)；关键词(8): efficient, lightweight, latency, throughput, attention
PromptRL: Prompt Matters in RL for Flow-Based Image Generation score 7
入选 HF Daily Papers；HF 热度: 8 upvotes (+2)；关键词(5): serving, alignment, post-training, agents, text-to-image
PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding score 5
入选 HF Daily Papers；HF 热度: 8 upvotes (+2)
SimpleGPT: Improving GPT via A Simple Normalization Strategy score 5
入选 HF Daily Papers；HF 热度: 3 upvotes (+1)；关键词(1): transformer

也值得关注

PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space score 3
顶会接收: NeurIPS
Dynamic Prior Thompson Sampling for Cold-Start Exploration in Recommender Systems score 2
关键词(3): efficiency, serving, latency
ConsensusDrop: Fusing Visual and Cross-Modal Saliency for Efficient Vision Language Models score 2
关键词(6): efficient, efficiency, pruning, attention, vision-language
FinEvo: From Isolated Backtests to Ecological Market Games for Multi-Agent Financial Strategy Evolution score 2
关键词(3): agent, agents, evaluation
MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support score 2
关键词(4): lightweight, agent, evaluation, safety
R-HTN: Rebellious Online HTN Planning for Safety and Game AI score 2
关键词(5): agent, agents, planning, search, safety
Optimal Budgeted Adaptation of Large Language Models score 2
关键词(3): efficiency, latency, fine-tuning
SAGE: Agentic Framework for Interpretable and Clinically Translatable Computational Pathology Biomarker Discovery score 2
关键词(5): agents, agentic, reasoning, multimodal, evaluation
Small-Margin Preferences Still Matter-If You Train Them Right score 2
关键词(4): fine-tuning, DPO, alignment, preference
From drift to adaptation to the failed ml model: Transfer Learning in Industrial MLOps score 2
关键词(2): production, attention
Probing the Knowledge Boundary: An Interactive Agentic Framework for Deep Knowledge Extraction score 2
关键词(2): scaling, agentic
Multimodal Scientific Learning Beyond Diffusions and Flows score 2
关键词(3): efficiency, diffusion, multimodal
Symphony-Coord: Emergent Coordination in Decentralized Agent Systems score 2
关键词(3): efficiency, lightweight, agent
Verification Required: The Impact of Information Credibility on AI Persuasion score 2
关键词(6): distillation, deployment, inference, agent, agents