AI Research Brief
Search
Methodology
中文
4M Game Frames Train Rendering, Internalized Skills Beat Retrieval
20 selected from 292 papers
Featured
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
score 10
入选 HF Daily Papers; HF 热度: 110 upvotes (+4); 有代码实现; 关键词(1): reasoning
Generative World Renderer
score 10
入选 HF Daily Papers; HF 热度: 83 upvotes (+4); 有代码实现; 关键词(1): scaling
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
score 10
入选 HF Daily Papers; HF 热度: 77 upvotes (+4); 有代码实现; 关键词(1): agentic
Steerable Visual Representations
score 10
入选 HF Daily Papers; HF 热度: 38 upvotes (+4); 有代码实现; 关键词(2): lightweight, vision-language
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
score 10
入选 HF Daily Papers; HF 热度: 25 upvotes (+4); 有代码实现; 关键词(1): reasoning
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
score 9
入选 HF Daily Papers; HF 热度: 33 upvotes (+4); 有代码实现
NearID: Identity Representation Learning via Near-identity Distractors
score 9
入选 HF Daily Papers; HF 热度: 24 upvotes (+4); 有代码实现
VOID: Video Object and Interaction Deletion
score 9
入选 HF Daily Papers; HF 热度: 19 upvotes (+3); 有代码实现; 关键词(2): reasoning, vision-language
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
score 9
入选 HF Daily Papers; HF 热度: 14 upvotes (+3); 有代码实现; 关键词(2): reasoning, vision-language
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models
score 8
入选 HF Daily Papers; HF 热度: 7 upvotes (+2); 有代码实现; 关键词(2): deployment, vision-language
Also Worth Noting
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation
score 5
入选 HF Daily Papers; HF 热度: 5 upvotes (+2)
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
score 4
关键词(2): reasoning, vision-language; 顶会接收: CVPR
Ultrasound-CLIP: Semantic-Aware Contrastive Pre-training for Ultrasound Image-Text Understanding
score 4
关键词(5): real-time, fine-tuning, pre-training, reasoning, vision-language; 顶会接收: CVPR
Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning
score 4
关键词(3): scaling, reasoning, vision-language; 顶会接收: ICLR
How and why does deep ensemble coupled with transfer learning increase performance in bipolar disorder and schizophrenia classification?
score 4
机构: EPFL; 关键词(1): pre-training
Mining Instance-Centric Vision-Language Contexts for Human-Object Interaction Detection
score 4
关键词(2): reasoning, vision-language; 顶会接收: CVPR
SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation
score 4
关键词(3): pre-training, reasoning, vision-language; 顶会接收: CVPR
MonoSAOD: Monocular 3D Object Detection with Sparsely Annotated Label
score 3
顶会接收: CVPR
Bias mitigation in graph diffusion models
score 3
顶会接收: ICLR
PTC-Depth: Pose-Refined Monocular Depth Estimation with Temporal Consistency
score 3
顶会接收: CVPR