-
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
score 10
入选 HF Daily Papers; HF 热度: 31 upvotes (+4); 有代码实现; 关键词(3): GRPO, reasoning, text-to-image
-
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
score 10
入选 HF Daily Papers; HF 热度: 24 upvotes (+4); 有代码实现; 关键词(3): scaling, agentic, tool use
-
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
score 10
入选 HF Daily Papers; HF 热度: 30 upvotes (+4); 有代码实现; 关键词(1): reasoning
-
L2P: Unlocking Latent Potential for Pixel Generation
score 9
入选 HF Daily Papers; HF 热度: 26 upvotes (+4); 有代码实现
-
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
score 9
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 有代码实现; 关键词(1): agentic
-
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
score 9
入选 HF Daily Papers; HF 热度: 16 upvotes (+3); 有代码实现; 关键词(1): coding
-
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
score 9
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 有代码实现; 关键词(3): throughput, PPO, agentic
-
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark
score 9
入选 HF Daily Papers; HF 热度: 13 upvotes (+3); 有代码实现; 关键词(1): open-source
-
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
score 8
入选 HF Daily Papers; HF 热度: 58 upvotes (+4); 关键词(2): deployment, reasoning
-
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
score 8
入选 HF Daily Papers; HF 热度: 9 upvotes (+2); 有代码实现; 关键词(3): scaling, post-training, reasoning