Sources | Drop CLIP, Gain Performance: VLMs Work Better Without It

Featured

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders score 13
机构: Tencent; 入选 HF Daily Papers; HF 热度: 64 upvotes (+4); 有代码实现; 关键词(6): scaling, lightweight, deployment, edge, pretraining
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction score 6
入选 HF Daily Papers; 有代码实现; 关键词(2): lightweight, reasoning
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling score 8
入选 HF Daily Papers; HF 热度: 6 upvotes (+2); 有代码实现; 关键词(1): latency
Physical Simulator In-the-Loop Video Generation score 6
入选 HF Daily Papers; 顶会接收: CVPR

Also Worth Noting

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis score 4
关键词(2): reasoning, vision-language; 顶会接收: ICLR
BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation score 4
关键词(1): text-to-image; 顶会接收: CVPR
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models score 4
关键词(1): text-to-image; 顶会接收: CVPR
Making Training-Free Diffusion Segmentors Scale with the Generative Power score 4
关键词(1): text-to-image; 顶会接收: CVPR
Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events score 4
关键词(2): lightweight, reasoning; 顶会接收: CVPR
DC-Merge: Improving Model Merging with Directional Consistency score 4
关键词(2): fine-tuning, vision-language; 顶会接收: CVPR
Dynamic Chunking Diffusion Transformer score 5
入选 HF Daily Papers; HF 热度: 3 upvotes (+1); 关键词(2): compression, post-training
Pano3DComposer: Feed-Forward Compositional 3D Scene Generation from Single Panoramic Image score 3
顶会接收: CVPR
Unify the Views: View-Consistent Prototype Learning for Few-Shot Segmentation score 3
顶会接收: CVPR
Imagine How To Change: Explicit Procedure Modeling for Change Captioning score 3
顶会接收: ICLR
Dynamic Momentum Recalibration in Online Gradient Learning score 3
顶会接收: CVPR
Learning to Solve Orienteering Problem with Time Windows and Variable Profits score 3
顶会接收: ICLR
SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation score 3
顶会接收: CVPR