Sources | Late Layers Quietly Rewrite Correct Answers for Alignment

Featured

DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams score 10
入选 HF Daily Papers; HF 热度: 70 upvotes (+4); 有代码实现; 关键词(4): fine-tuning, GRPO, post-training, agentic
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions score 9
入选 HF Daily Papers; HF 热度: 76 upvotes (+4); 有代码实现
Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding score 10
入选 HF Daily Papers; HF 热度: 20 upvotes (+4); 有代码实现; 关键词(2): latency, reasoning
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking score 8
入选 HF Daily Papers; HF 热度: 44 upvotes (+4); 关键词(1): deployment
CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents score 8
入选 HF Daily Papers; HF 热度: 31 upvotes (+4); 关键词(2): fine-tuning, open-source
BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language score 9
入选 HF Daily Papers; HF 热度: 21 upvotes (+4); 有代码实现
SkillHarness: Harnessing Safe Skills for Computer-Use Agents score 8
入选 HF Daily Papers; HF 热度: 19 upvotes (+3); 有代码实现
Training Open Models for Agentic Phone Use score 9
入选 HF Daily Papers; HF 热度: 14 upvotes (+3); 有代码实现; 关键词(3): deployment, fine-tuning, agentic
Tmax: A simple recipe for terminal agents score 9
入选 HF Daily Papers; HF 热度: 11 upvotes (+3); 有代码实现; 关键词(1): open-source
Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention score 7
入选 HF Daily Papers; HF 热度: 63 upvotes (+4)

Also Worth Noting

Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City score 5
入选 HF Daily Papers; HF 热度: 3 upvotes (+1); 关键词(2): edge, reasoning
When Agents Commit Too Soon: Diagnosing Premature Commitment in LLM Agents score 4
入选 HF Daily Papers; 关键词(1): reasoning
Have You Ever Seen Them? Entity-level Membership Inference through Interrogating Large Language Models score 3
机构: Zhejiang University