AI Research Brief
Search
Methodology
中文
Lorem Ipsum Rescues GRPO's Wasted Hard Samples
12 selected from 578 papers
Featured
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
score 10
入选 HF Daily Papers; HF 热度: 74 upvotes (+4); 有代码实现; 关键词(3): lightweight, agentic, reasoning
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
score 10
入选 HF Daily Papers; HF 热度: 34 upvotes (+4); 有代码实现; 关键词(1): fine-tuning
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
score 10
入选 HF Daily Papers; HF 热度: 24 upvotes (+4); 有代码实现; 关键词(1): distillation
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
score 9
入选 HF Daily Papers; HF 热度: 17 upvotes (+3); 有代码实现; 关键词(2): GRPO, agentic
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
score 8
入选 HF Daily Papers; HF 热度: 63 upvotes (+4); 关键词(1): distillation
Continuous Latent Diffusion Language Model
score 8
入选 HF Daily Papers; HF 热度: 59 upvotes (+4); 关键词(2): scaling, compression
MiA-Signature: Approximating Global Activation for Long-Context Understanding
score 8
入选 HF Daily Papers; HF 热度: 49 upvotes (+4); 关键词(3): lightweight, RAG, agentic
SkillOS: Learning Skill Curation for Self-Evolving Agents
score 8
入选 HF Daily Papers; HF 热度: 32 upvotes (+4); 关键词(2): agentic, reasoning
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
score 8
入选 HF Daily Papers; HF 热度: 31 upvotes (+4); 关键词(2): GRPO, reasoning
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
score 8
入选 HF Daily Papers; HF 热度: 12 upvotes (+3); 有代码实现
Also Worth Noting
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
score 5
入选 HF Daily Papers; HF 热度: 2 upvotes (+1); 关键词(3): scaling, post-training, reasoning
TIDE: Every Layer Knows the Token Beneath the Context
score 4
入选 HF Daily Papers; HF 热度: 4 upvotes (+1)