-
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
score 11
入选 HF Daily Papers; HF 热度: 18 upvotes (+3); 关键词(7): agent, agents, agentic, coding, benchmark; 顶会接收: ICLR
-
Voxtral Realtime
score 7
入选 HF Daily Papers; HF 热度: 6 upvotes (+2); 关键词(5): latency, alignment, pretraining, speech, audio
-
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
score 9
入选 HF Daily Papers; HF 热度: 172 upvotes (+4); 关键词(15): efficient, efficiency, fast, inference, latency
-
GENIUS: Generative Fluid Intelligence Evaluation Suite
score 9
入选 HF Daily Papers; HF 热度: 52 upvotes (+4); 关键词(4): attention, reasoning, multimodal, evaluation
-
PhyCritic: Multimodal Critic Models for Physical AI
score 9
入选 HF Daily Papers; HF 热度: 49 upvotes (+4); 关键词(8): finetuning, alignment, preference, reasoning, planning
-
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
score 9
入选 HF Daily Papers; HF 热度: 27 upvotes (+4); 关键词(4): efficient, efficiency, inference, reasoning
-
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
score 9
入选 HF Daily Papers; HF 热度: 26 upvotes (+4); 关键词(3): preference, pretraining, attention
-
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation
score 6
入选 HF Daily Papers; HF 热度: 4 upvotes (+1); 关键词(5): planning, embodied, benchmark, evaluation, open-source
-
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
score 8
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 关键词(4): efficient, compression, fine-tuning, coding
-
GameDevBench: Evaluating Agentic Capabilities Through Game Development
score 8
入选 HF Daily Papers; HF 热度: 14 upvotes (+3); 关键词(7): agent, agents, agentic, coding, multimodal