论文来源 | RL训练数据不够用？把简单题拼成难题就行

重点关注

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling score 7
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(6): efficient, efficiency, inference, transformer, attention
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing score 9
入选 HF Daily Papers；HF 热度: 72 upvotes (+4)；关键词(9): efficient, lightweight, deployment, fine-tuning, GRPO
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation score 9
入选 HF Daily Papers；HF 热度: 55 upvotes (+4)；关键词(4): scaling, distillation, code generation, reasoning
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning score 9
入选 HF Daily Papers；HF 热度: 45 upvotes (+4)；关键词(4): deployment, reasoning, vision-language, benchmark
Thinking with Drafting: Optical Decompression via Logical Reconstruction score 9
入选 HF Daily Papers；HF 热度: 31 upvotes (+4)；关键词(3): reasoning, multimodal, benchmark
LawThinker: A Deep Research Legal Agent in Dynamic Environments score 9
入选 HF Daily Papers；HF 热度: 31 upvotes (+4)；关键词(3): agent, reasoning, benchmark
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning score 9
入选 HF Daily Papers；HF 热度: 26 upvotes (+4)；关键词(2): scaling, reasoning
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching score 9
入选 HF Daily Papers；HF 热度: 26 upvotes (+4)；关键词(2): distillation, serving
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models score 8
入选 HF Daily Papers；HF 热度: 86 upvotes (+4)；关键词(1): reasoning
dVoting: Fast Voting for dLLMs score 8
入选 HF Daily Papers；HF 热度: 19 upvotes (+3)；关键词(4): scaling, fast, diffusion, reasoning

也值得关注

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery score 4
关键词(1): reasoning；顶会接收: NeurIPS
PLESS: Pseudo-Label Enhancement with Spreading Scribbles for Weakly Supervised Segmentation score 4
机构: Oxford；关键词(1): cost
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer score 4
关键词(1): transformer；顶会接收: ICLR
Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning score 2
关键词(4): lightweight, attention, reasoning, multimodal
ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias score 2
关键词(3): reasoning, benchmark, evaluation
EM-Aware Physical Synthesis: Neural Inductor Modeling and Intelligent Placement & Routing for RF Circuits score 2
关键词(3): fast, inference, real-time
Assessing Low Back Movement with Motion Tape Sensor Data Through Deep Learning score 2
关键词(3): inference, cost, synthetic data
A Dual-Branch Framework for Semantic Change Detection with Boundary and Temporal Awareness score 2
关键词(2): edge, reasoning
When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration score 2
关键词(5): fine-tuning, reasoning, speech, audio, benchmark
Arbitrary Ratio Feature Compression via Next Token Prediction score 2
关键词(3): efficiency, compression, inference
Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models score 2
关键词(7): lightweight, deployment, inference, fine-tuning, mamba