Sources | Lorem Ipsum Rescues GRPO's Wasted Hard Samples

Featured

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction score 10
入选 HF Daily Papers; HF 热度: 74 upvotes (+4); 有代码实现; 关键词(3): lightweight, agentic, reasoning
MARBLE: Multi-Aspect Reward Balance for Diffusion RL score 10
入选 HF Daily Papers; HF 热度: 34 upvotes (+4); 有代码实现; 关键词(1): fine-tuning
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation score 10
入选 HF Daily Papers; HF 热度: 24 upvotes (+4); 有代码实现; 关键词(1): distillation
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction score 9
入选 HF Daily Papers; HF 热度: 17 upvotes (+3); 有代码实现; 关键词(2): GRPO, agentic
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning score 8
入选 HF Daily Papers; HF 热度: 63 upvotes (+4); 关键词(1): distillation
Continuous Latent Diffusion Language Model score 8
入选 HF Daily Papers; HF 热度: 59 upvotes (+4); 关键词(2): scaling, compression
MiA-Signature: Approximating Global Activation for Long-Context Understanding score 8
入选 HF Daily Papers; HF 热度: 49 upvotes (+4); 关键词(3): lightweight, RAG, agentic
SkillOS: Learning Skill Curation for Self-Evolving Agents score 8
入选 HF Daily Papers; HF 热度: 32 upvotes (+4); 关键词(2): agentic, reasoning
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration score 8
入选 HF Daily Papers; HF 热度: 31 upvotes (+4); 关键词(2): GRPO, reasoning
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes score 8
入选 HF Daily Papers; HF 热度: 12 upvotes (+3); 有代码实现

Also Worth Noting

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling score 5
入选 HF Daily Papers; HF 热度: 2 upvotes (+1); 关键词(3): scaling, post-training, reasoning
TIDE: Every Layer Knows the Token Beneath the Context score 4
入选 HF Daily Papers; HF 热度: 4 upvotes (+1)