-
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
score 10
入选 HF Daily Papers;HF 热度: 25 upvotes (+4);有代码实现;关键词(3): scaling, deployment, throughput
-
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation
score 9
入选 HF Daily Papers;HF 热度: 25 upvotes (+4);有代码实现
-
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models
score 8
入选 HF Daily Papers;HF 热度: 8 upvotes (+2);有代码实现;关键词(3): distillation, deployment, vision-language
-
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
score 8
入选 HF Daily Papers;HF 热度: 8 upvotes (+2);有代码实现;关键词(5): GRPO, retrieval-augmented, RAG, reasoning, vision-language
-
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification
score 8
入选 HF Daily Papers;HF 热度: 6 upvotes (+2);有代码实现;关键词(4): lightweight, deployment, production, open-source
-
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
score 9
入选 HF Daily Papers;HF 热度: 6 upvotes (+2);关键词(3): fine-tuning, GRPO, post-training;顶会接收: CVPR
-
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
score 7
入选 HF Daily Papers;HF 热度: 4 upvotes (+1);有代码实现;关键词(3): retrieval-augmented, RAG, agentic
-
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
score 8
入选 HF Daily Papers;HF 热度: 5 upvotes (+2);有代码实现;关键词(1): agentic
-
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
score 6
入选 HF Daily Papers;HF 热度: 17 upvotes (+3)
-
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning
score 6
入选 HF Daily Papers;HF 热度: 5 upvotes (+2);关键词(3): quantization, GRPO, reasoning