Sources | The Rulers We Use to Measure What Models Really Think Are Broken

Featured

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth score 11
机构: Google; 入选 HF Daily Papers; HF 热度: 12 upvotes (+3); 有代码实现
Your Embedding Model is SMARTer Than You Think score 10
入选 HF Daily Papers; HF 热度: 23 upvotes (+4); 有代码实现; 关键词(4): lightweight, finetuning, post-training, open source
DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs score 8
入选 HF Daily Papers; HF 热度: 7 upvotes (+2); 有代码实现; 关键词(2): latency, reasoning
Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference score 7
入选 HF Daily Papers; HF 热度: 4 upvotes (+1); 有代码实现; 关键词(3): lightweight, fine-tuning, text-to-image
STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media score 6
入选 HF Daily Papers; 有代码实现; 关键词(2): retrieval-augmented, RAG
Geometry-Aware Image Flow Matching score 5
入选 HF Daily Papers; HF 热度: 9 upvotes (+2)
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking score 5
入选 HF Daily Papers; HF 热度: 2 upvotes (+1); 关键词(1): open-source
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models score 5
入选 HF Daily Papers; HF 热度: 2 upvotes (+1); 关键词(1): reasoning

Also Worth Noting

AOEPT: Breaking the Implicit Modality-Reduction Bottleneck in Modality-Missing Prompt Tuning score 4
关键词(3): lightweight, serving, reasoning; 顶会接收: ICML
Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning score 4
关键词(4): scaling, deployment, fine-tuning, reasoning; 顶会接收: ICML
When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability score 4
关键词(1): distillation; 顶会接收: CVPR
Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning score 4
关键词(1): reasoning; 顶会接收: ICML
Efficient DP-SGD for LLMs with Randomized Clipping score 4
关键词(1): fine-tuning; 顶会接收: ICML
ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents score 4
关键词(1): GRPO; 顶会接收: ACL
NITP: Next Implicit Token Prediction for LLM Pre-training score 4
关键词(2): pre-training, MoE; 顶会接收: ICML
Three-Step Conditional Diffusion 3D Reconstruction for Light-Field Microscopy score 4
关键词(2): lightweight, real-time; 顶会接收: CVPR
Furina: Fragmented Uncertainty-Driven Refusal Instability Attack score 4
机构: Princeton; 关键词(1): jailbreak
Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation score 4
关键词(3): DPO, instruction tuning, vision-language; 顶会接收: ICML
Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training score 4
关键词(3): compression, quantization, edge; 顶会接收: ICML
Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization score 4
机构: University of Toronto; 关键词(1): reasoning
HCL-FF: Hierarchical and Contrastive Learning for Forward-Forward Algorithm score 3
顶会接收: CVPR
Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets score 3
顶会接收: ICML
Large Language Model Selection with Limited Annotations score 3
机构: Oxford