Sources | 11B Active Parameters Hit Frontier-Level Agent Intelligence

Featured

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development score 11
入选 HF Daily Papers; HF 热度: 18 upvotes (+3); 关键词(7): agent, agents, agentic, coding, benchmark; 顶会接收: ICLR
Voxtral Realtime score 7
入选 HF Daily Papers; HF 热度: 6 upvotes (+2); 关键词(5): latency, alignment, pretraining, speech, audio
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters score 9
入选 HF Daily Papers; HF 热度: 172 upvotes (+4); 关键词(15): efficient, efficiency, fast, inference, latency
GENIUS: Generative Fluid Intelligence Evaluation Suite score 9
入选 HF Daily Papers; HF 热度: 52 upvotes (+4); 关键词(4): attention, reasoning, multimodal, evaluation
PhyCritic: Multimodal Critic Models for Physical AI score 9
入选 HF Daily Papers; HF 热度: 49 upvotes (+4); 关键词(8): finetuning, alignment, preference, reasoning, planning
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning score 9
入选 HF Daily Papers; HF 热度: 27 upvotes (+4); 关键词(4): efficient, efficiency, inference, reasoning
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning score 9
入选 HF Daily Papers; HF 热度: 26 upvotes (+4); 关键词(3): preference, pretraining, attention
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation score 6
入选 HF Daily Papers; HF 热度: 4 upvotes (+1); 关键词(5): planning, embodied, benchmark, evaluation, open-source
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression score 8
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 关键词(4): efficient, compression, fine-tuning, coding
GameDevBench: Evaluating Agentic Capabilities Through Game Development score 8
入选 HF Daily Papers; HF 热度: 14 upvotes (+3); 关键词(7): agent, agents, agentic, coding, multimodal

Also Worth Noting

C-MOP: Integrating Momentum and Boundary-Aware Clustering for Enhanced Prompt Evolution score 3
机构: Huawei
Deep learning outperforms traditional machine learning methods in predicting childhood malnutrition: evidence from survey data score 2
关键词(2): attention, cost
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models score 2
关键词(2): pretraining, attention
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents score 2
关键词(4): reasoning, multimodal, vision-language, benchmark
Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series score 2
关键词(5): alignment, transformer, attention, reasoning, benchmark
Making Databases Faster with LLM Evolutionary Sampling score 2
关键词(2): search, cost
Affordances Enable Partial World Modeling with LLMs score 2
关键词(5): efficiency, pre-training, agents, robotics, search
Tensor Methods: A Unified and Interpretable Approach for Material Design score 2
关键词(2): search, evaluation
Modular Multi-Task Learning for Chemical Reaction Prediction score 2
关键词(4): efficient, deployment, fine-tuning, benchmark
Towards Affordable, Non-Invasive Real-Time Hypoglycemia Detection Using Wearable Sensor Signals score 2
关键词(2): real-time, multimodal
Gated Removal of Normalization in Transformers Enables Stable Training and Efficient Inference score 2
关键词(6): scaling, efficient, efficiency, inference, throughput
LUCID: Attention with Preconditioned Representations score 2
关键词(2): transformer, attention