Sources | 4M Game Frames Train Rendering, Internalized Skills Beat Retrieval

Featured

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook score 10
入选 HF Daily Papers; HF 热度: 110 upvotes (+4); 有代码实现; 关键词(1): reasoning
Generative World Renderer score 10
入选 HF Daily Papers; HF 热度: 83 upvotes (+4); 有代码实现; 关键词(1): scaling
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization score 10
入选 HF Daily Papers; HF 热度: 77 upvotes (+4); 有代码实现; 关键词(1): agentic
Steerable Visual Representations score 10
入选 HF Daily Papers; HF 热度: 38 upvotes (+4); 有代码实现; 关键词(2): lightweight, vision-language
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model score 10
入选 HF Daily Papers; HF 热度: 25 upvotes (+4); 有代码实现; 关键词(1): reasoning
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery score 9
入选 HF Daily Papers; HF 热度: 33 upvotes (+4); 有代码实现
NearID: Identity Representation Learning via Near-identity Distractors score 9
入选 HF Daily Papers; HF 热度: 24 upvotes (+4); 有代码实现
VOID: Video Object and Interaction Deletion score 9
入选 HF Daily Papers; HF 热度: 19 upvotes (+3); 有代码实现; 关键词(2): reasoning, vision-language
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving score 9
入选 HF Daily Papers; HF 热度: 14 upvotes (+3); 有代码实现; 关键词(2): reasoning, vision-language
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models score 8
入选 HF Daily Papers; HF 热度: 7 upvotes (+2); 有代码实现; 关键词(2): deployment, vision-language

Also Worth Noting

Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation score 5
入选 HF Daily Papers; HF 热度: 5 upvotes (+2)
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning score 4
关键词(2): reasoning, vision-language; 顶会接收: CVPR
Ultrasound-CLIP: Semantic-Aware Contrastive Pre-training for Ultrasound Image-Text Understanding score 4
关键词(5): real-time, fine-tuning, pre-training, reasoning, vision-language; 顶会接收: CVPR
Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning score 4
关键词(3): scaling, reasoning, vision-language; 顶会接收: ICLR
How and why does deep ensemble coupled with transfer learning increase performance in bipolar disorder and schizophrenia classification? score 4
机构: EPFL; 关键词(1): pre-training
Mining Instance-Centric Vision-Language Contexts for Human-Object Interaction Detection score 4
关键词(2): reasoning, vision-language; 顶会接收: CVPR
SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation score 4
关键词(3): pre-training, reasoning, vision-language; 顶会接收: CVPR
MonoSAOD: Monocular 3D Object Detection with Sparsely Annotated Label score 3
顶会接收: CVPR
Bias mitigation in graph diffusion models score 3
顶会接收: ICLR
PTC-Depth: Pose-Refined Monocular Depth Estimation with Temporal Consistency score 3
顶会接收: CVPR