论文来源 | 视觉模型开始重新设计自己的输出方式

重点关注

SpatialBench: Is Your Spatial Foundation Model an All-Round Player? score 10
入选 HF Daily Papers；HF 热度: 63 upvotes (+4)；有代码实现；关键词(2): scaling, embodied
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini score 7
入选 HF Daily Papers；HF 热度: 15 upvotes (+3)；关键词(1): RAG
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding score 8
入选 HF Daily Papers；HF 热度: 114 upvotes (+4)；关键词(2): throughput, vision-language
JLT: Clean-Latent Prediction in Latent Diffusion Transformers score 10
入选 HF Daily Papers；HF 热度: 26 upvotes (+4)；有代码实现；关键词(1): compression
MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale score 6
入选 HF Daily Papers；HF 热度: 5 upvotes (+2)；关键词(2): distillation, real-time

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization score 7
入选 HF Daily Papers；HF 热度: 3 upvotes (+1)；有代码实现；关键词(3): edge, agentic, coding
PlayClass: Automated Play Behaviour Classification in Poultry score 3
机构: Imperial College