Sources | Vision Models Start Redesigning How They Output

Featured

SpatialBench: Is Your Spatial Foundation Model an All-Round Player? score 10
入选 HF Daily Papers; HF 热度: 63 upvotes (+4); 有代码实现; 关键词(2): scaling, embodied
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini score 7
入选 HF Daily Papers; HF 热度: 15 upvotes (+3); 关键词(1): RAG
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding score 8
入选 HF Daily Papers; HF 热度: 114 upvotes (+4); 关键词(2): throughput, vision-language
JLT: Clean-Latent Prediction in Latent Diffusion Transformers score 10
入选 HF Daily Papers; HF 热度: 26 upvotes (+4); 有代码实现; 关键词(1): compression
MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale score 6
入选 HF Daily Papers; HF 热度: 5 upvotes (+2); 关键词(2): distillation, real-time

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization score 7
入选 HF Daily Papers; HF 热度: 3 upvotes (+1); 有代码实现; 关键词(3): edge, agentic, coding
PlayClass: Automated Play Behaviour Classification in Poultry score 3
机构: Imperial College