论文来源 | 医疗LLM不该只答题，应该像医生一样主动问诊

重点关注

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making score 8
入选 HF Daily Papers；HF 热度: 17 upvotes (+3)；关键词(2): reasoning, safety
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math score 8
入选 HF Daily Papers；HF 热度: 13 upvotes (+3)；关键词(2): reasoning, evaluation
POINTS-GUI-G: GUI-Grounding Journey score 8
入选 HF Daily Papers；HF 热度: 10 upvotes (+3)；关键词(7): inference, fine-tuning, fine-tune, agents, reasoning
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos score 7
入选 HF Daily Papers；HF 热度: 6 upvotes (+2)；关键词(8): distillation, real-time, post-training, pretraining, agents
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks score 7
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(6): efficiency, reasoning, planning, multimodal, benchmark
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning score 7
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(7): scaling, efficient, efficiency, inference, latency
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare score 6
入选 HF Daily Papers；HF 热度: 4 upvotes (+1)；关键词(4): scaling, lightweight, GRPO, cost
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks score 6
入选 HF Daily Papers；HF 热度: 3 upvotes (+1)；关键词(7): fine-tuning, DPO, alignment, preference, open-source
Revisiting the Shape Convention of Transformer Language Models score 5
入选 HF Daily Papers；关键词(3): efficient, transformer, attention
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation score 5
关键词(6): fine-tuning, GRPO, reasoning, multimodal, speech；顶会接收: ICLR

也值得关注

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs score 2
关键词(5): retrieval-augmented, RAG, benchmark, evaluation, safety
RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution score 2
关键词(3): efficient, reasoning, open-source
An Interpretable Vision Transformer as a Fingerprint-Based Diagnostic Aid for Kabuki and Wiedemann-Steiner Syndromes score 2
关键词(2): transformer, attention
SOCKET: SOft Collison Kernel EsTimator for Sparse Attention score 2
关键词(6): scaling, efficient, inference, throughput, attention
MMEarth-Bench: Global Model Adaptation via Multimodal Test-Time Training score 2
关键词(3): pretraining, multimodal, benchmark
Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making score 2
关键词(2): agent, agents
Zero-shot Multi-Contrast Brain MRI Registration by Intensity Randomizing T1-weighted MRI (LUMIR25) score 2
关键词(3): lightweight, inference, multimodal
Accelerating Vision Transformers on Brain Processing Unit score 2
关键词(5): efficient, deployment, inference, fine-tuning, transformer
Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Code-Switching Beyond Standard UD Assumptions score 2
关键词(4): agentic, speech, benchmark, evaluation
The Condensate Theorem: Transformers are O(n), Not $O(n^2)$ score 2
关键词(2): inference, attention
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems score 2
关键词(3): reasoning, benchmark, evaluation
Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics score 2
关键词(7): efficiency, lightweight, deployment, real-time, edge
Halt the Hallucination: Decoupling Signal and Semantic OOD Detection Based on Cascaded Early Rejection score 2
关键词(4): efficient, inference, benchmark, safety
Don't Break the Boundary: Continual Unlearning for OOD Detection Based on Free Energy Repulsion score 2
关键词(4): efficient, fine-tuning, cost, safety
Taming SAM3 in the Wild: A Concept Bank for Open-Vocabulary Segmentation score 2
关键词(2): efficiency, alignment