Archive | AI Research Brief

2026-07

Jul 14, 2026 Multi-Turn Pressure Pushes CLI Agents to 100% Compliance Daily
Jul 13, 2026 The Best Model Gets Idea Lineage Right Only 27% of the Time Daily
Jul 12, 2026 Structural Reasoning Takes 67 SOTAs, Async RL Ships in GLM-5.2 Daily
Jul 11, 2026 VLA Memory Goes Native, a World Model Runs 720p at 60fps Daily
Jul 10, 2026 RL Reaches Image Generation, Gemma 4 Goes Open Daily
Jul 9, 2026 Verification as the Fourth Scaling Axis Daily
Jul 8, 2026 Avatar Resolution Doubles, Latency Holds at 200ms Daily
Jul 7, 2026 Latent-Space Scoring Cuts Video Generation to 1-4 Steps Daily
Jul 6, 2026 0.6B Matches 32B, 50x Less VRAM Daily
Jul 5, 2026 Memory Makes Agents Sycophantic; Visual Reasoning Hits 93.2% Daily
Jul 4, 2026 Adaptive Decoding Hits 4.2x, Joint Training 10x Faster Daily
Jul 3, 2026 A 35B Agent Reaches for Trillion-Scale, and Async Lag Is Overrated Daily
Jul 2, 2026 A 135M Pixel Model Beats Billion-Parameter Baselines Daily
Jul 1, 2026 Frontier Agents Finish One Task in Five at 1.6-Hour Length Daily

2026-06

Jun 30, 2026 Knowing When to Stop Doubles an Agent's Recall Daily
Jun 29, 2026 Video World Models Stall at 24%, Jailbreaks Only Mute a Few Heads Daily
Jun 28, 2026 ViQ Speeds Up Multimodal Training 20-70% Daily
Jun 27, 2026 ImageNet-FID Negatively Correlates With Text-to-Image Daily
Jun 26, 2026 Late Layers Quietly Rewrite Correct Answers for Alignment Daily
Jun 25, 2026 Pose New Objects From a Few Reference Photos Daily
Jun 23, 2026 An 8B Model Beats a 235B One at Science Reasoning Daily
Jun 22, 2026 A Stateful 260M Embedding Beats 8B Specialists Daily
Jun 21, 2026 Leaderboards Don't Predict Deployment; Robot Arm Self-Trains to 99% Daily
Jun 20, 2026 A 7B Video Agent Beats a 72B Model by Looking Less Daily
Jun 19, 2026 Two Loops Take SWE-bench From 43 to 64 Daily
Jun 18, 2026 Pruning Context Can Cost More Than It Saves Daily
Jun 17, 2026 A 1.5B Model Beats Sonnet 3.5 at Event Forecasting Daily
Jun 16, 2026 Best Code Agent Hits 61.1%, VLA Reasoning Runs 6x Faster Daily
Jun 15, 2026 Pruning a Small Model Is a Shortcut Only on a Tight Budget Daily
Jun 14, 2026 Swap the Action Interface, Gain 11 Points on Spatial Reasoning Daily
Jun 13, 2026 Arbor Triples Research Gains; Environments Become the New Scaling Axis Daily
Jun 12, 2026 One Token per Evidence Cuts Generation Cost 3-10x Daily
Jun 11, 2026 DeepSeek V4 Cuts KV to 13.5%, Video Memory Runs 10x Faster Daily
Jun 10, 2026 Video Models Stumble on Composite Edits, MoE Fails at the Router Daily
Jun 9, 2026 Swap the Arm Without Retraining; VLMs See Both the Duck and the Rabbit Daily
Jun 8, 2026 dots.tts Hits 54ms First Packet, SWE Agent Self-Evolves Past 50% Daily
Jun 7, 2026 Streaming Hand-offs Make Multi-Agent Sharper, ZipSplat Splats With 1/6 the Gaussians Daily
Jun 6, 2026 NVIDIA Packs Five Modalities Into One Set of Weights Daily
Jun 5, 2026 A 20B Search Agent Ties the Frontier by Offloading Its Bookkeeping Daily
Jun 4, 2026 A 4B Web Agent Catches Up to Closed CUAs on a Few Thousand Trajectories Daily
Jun 3, 2026 Move to See: Top Model Reaches the Target Just 12% of the Time Daily
Jun 2, 2026 MoE Safety Lives in a Few Experts, Exclusive Batching Adds 42% Daily
Jun 1, 2026 LoRA as a Ruler for Memory, Reversed Video as a Free Counterfactual Daily

2026-05

May 31, 2026 World Models Go Multiplayer, Real-Time at 24FPS Daily
May 30, 2026 Agents Start Improving Themselves, and Reaching for Fewer Tools Daily
May 29, 2026 Vision Models Start Redesigning How They Output Daily
May 28, 2026 Diffusion Swallows the Decoder Too Daily
May 27, 2026 The Rulers We Use to Measure What Models Really Think Are Broken Daily
May 25, 2026 Agent Trajectories Let a 30B Match a 235B Daily
May 24, 2026 Gated DeltaNet-2 Splits the Gate, Maestro Outscores GPT-5 Daily
May 23, 2026 Optimizer Choice Stretches Capacity Scaling 2.3x Daily
May 22, 2026 $15 Per Paper, Healthcare Agents Cap at 28% Daily
May 21, 2026 Dual-Stream MoE Unifies Multimodal, Garment Video 30x Faster Daily
May 20, 2026 Stop When Reasoning Converges, Save 26% of Tokens Daily
May 19, 2026 8% of Tokens Decide the Reasoning Gap Daily
May 18, 2026 Real-Time Video's Bottleneck Moved Past Step Count Daily
May 17, 2026 Olympiad Gold Becomes a Two-Step Recipe Daily
May 16, 2026 Readable Rules Don't Belong in LLM Weights Daily
May 15, 2026 δ-mem Trades Long Context for an 8×8 State Matrix Daily
May 14, 2026 Flow-OPD Lifts GenEval From 63 to 92 Daily
May 13, 2026 Geometry Conflict Predicts Continual Fine-Tuning Forgetting Daily
May 12, 2026 Soohak Caps Top Models at 30% Daily
May 11, 2026 Lorem Ipsum Rescues GRPO's Wasted Hard Samples Daily
May 9, 2026 10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA Daily
May 8, 2026 T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x Daily
May 7, 2026 Gradient Boosting Turns Out to Be Diffusion's Asymptotic Optimum Daily
May 4, 2026 ViT Pre-Trains Like an LLM, Skips the CLIP Stage Daily
May 3, 2026 FD as Loss: One-Step Generation Hits 0.72 FID Daily
May 2, 2026 Cross-Architecture Distillation Shrinks dLLMs to 0.6B Daily
May 1, 2026 Recursive MAS Cuts Tokens 35%, T2I Repaints Instead of Editing Daily

2026-04

Apr 30, 2026 RL Patches 3D Consistency Into Video Models Without Touching Architecture Daily
Apr 29, 2026 Emotion Probes Crash From 82% to 5% Without Keywords Daily
Apr 28, 2026 ProEval Cuts Benchmark Eval Samples 8-65x Daily
Apr 27, 2026 Full Traces Lift Multi-Agent Attribution Accuracy 76% Daily
Apr 26, 2026 4B Agent on 10K Data, MoE Upcycling Saves 32% Compute Daily
Apr 25, 2026 Coding Agents Start Cheating by Round 4 Under Score Pressure Daily
Apr 24, 2026 Recalibrating the Critic Lifts Reasoning Models 18 Points Daily
Apr 23, 2026 A 305M Retriever Gains 45% on Instruction Following Daily
Apr 22, 2026 Agents Ignore Answers Placed in Plain Sight Daily
Apr 21, 2026 3B Matches R1 on Refusal; B Matrix Is LoRA's Bottleneck Daily
Apr 20, 2026 Open Omni Hits Flagship Scale, Self-Judge Breaks, Reasoning Leaks Forgotten Facts Daily
Apr 19, 2026 Compile the Corpus Into a Skill Tree, Train Surrogates on Logs Daily
Apr 18, 2026 Tencent Open-Sources 3D World Generation, VLM Modal Bias Probe Daily
Apr 17, 2026 Big Models Resist Rumors but Fall for Noise Daily
Apr 16, 2026 VLMs Break When You Change the Rules Daily
Apr 15, 2026 dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x Daily
Apr 14, 2026 SFT Convergence Hides Failures, Attention Hijacking Hits 94% Daily
Apr 13, 2026 DMax Triples Parallel Decoding Efficiency for Diffusion LMs Daily
Apr 12, 2026 Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o Daily
Apr 11, 2026 1.7x Faster From Fine-Tuning Alone, Token Collapse Misdiagnosed Daily
Apr 10, 2026 Entropy Is Lying to You, Implicit Reasoning Tops Out at 7 Steps Daily
Apr 9, 2026 120B on One GPU, and 40% of Video Benchmarks Are Guessable Daily
Apr 8, 2026 Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels Daily
Apr 7, 2026 Learned Sparsity Cuts Diffusion Inference Compute by 54% Daily
Apr 6, 2026 Open-Source 32B Cracks Hardware Code, Agents Score Just 23% Daily
Apr 5, 2026 4M Game Frames Train Rendering, Internalized Skills Beat Retrieval Daily
Apr 4, 2026 Single Neurons Remember Entities, Reusable Routines Boost 19% Daily
Apr 3, 2026 Minimalist Agents Match MCP, Code Models Think Mid-Stream Daily
Apr 2, 2026 Data Mixing Becomes Post-Training, Surface Cues Hijack Reasoning 38x Daily

2026-03

Mar 30, 2026 Watermarks Enable Bit-Level Tracing, Diffusion VLMs Ground GUI Daily
Mar 29, 2026 Mistral Ships TTS, Diffusion LLMs Get 4.7x Faster Daily
Mar 28, 2026 Self-Distillation Strips Out Hesitation, OOD Drops 40% Daily
Mar 27, 2026 Speculative Execution Hits Agent Loops, 3x Faster Daily
Mar 26, 2026 Diffusion OCR Decodes 3.2x Faster, Single-Stream AV in 2 Seconds Daily
Mar 25, 2026 PDEs Beat Attention 2x, Local RL Saves 3/4 Compute Daily
Mar 24, 2026 Seed1.8 Goes Agent-Native, Language Training Erodes Vision Daily
Mar 23, 2026 12B Beats GPT-4, Distilled Students Surpass Teachers Daily
Mar 22, 2026 3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work Daily
Mar 21, 2026 3D at 0.1% Tokens, Video Fine-Tuning's Hidden Spatial Cost Daily
Mar 20, 2026 First 32B Industrial Code Model, War-Tested Reasoning Eval Daily
Mar 19, 2026 Open-Source Search Agent Wins With 12K Samples, Agent Skills Mostly Fail Daily
Mar 18, 2026 700K Paper Pairs Distill Taste, Null Spaces Expose Blind Spots Daily
Mar 17, 2026 Expert Reasoning Structure for CoT, +13% on Novel Class Discovery Daily
Mar 16, 2026 Budget-Aware Agents Beat 4x Brute-Force Sampling Daily
Mar 15, 2026 Document Agents Navigate by Luck, Prefill Speeds Up 1.82x Daily
Mar 14, 2026 Encode the Answer, Not the Question — Embeddings Gain 9% Daily
Mar 13, 2026 \"Think It Over\" Can Unlock a Model's Memory Bank Daily
Mar 12, 2026 Write Code Before You Draw, Layouts Improve 68% Daily
Mar 11, 2026 4-Step Diffusion Beats 100-Step Baselines, Layer Skipping Saves 18% Daily
Mar 10, 2026 12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster Daily
Mar 9, 2026 Drop CLIP, Gain Performance: VLMs Work Better Without It Daily
Mar 8, 2026 \"Be Concise\" Halves Tokens, Lifts Accuracy by 16 Points Daily
Mar 7, 2026 14B Video Model Runs Real-Time on a Single GPU Daily
Mar 6, 2026 Code Agents Can't Cross Repo Boundaries, Under 45% Success Daily
Mar 5, 2026 Direct Lottie Generation, DPO's Built-In Forgetting Defense Daily
Mar 4, 2026 9K Samples Rival R1, Most RL Gains Trace Back to SFT Daily
Mar 3, 2026 Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy Daily
Mar 2, 2026 Drop 90% of Vision Tokens, Keep the Performance Daily
Mar 1, 2026 Latent Reasoning's Gains Aren't From Reasoning Daily

2026-02

Feb 28, 2026 Tri-Modal Training From Scratch, Agentic RL Gets a Stability Fix Daily
Feb 27, 2026 TTT Is Linear Attention, Terminal Agent Data Recipe Goes Open Daily
Feb 26, 2026 11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x Daily
Feb 25, 2026 Token Probabilities as Zero-Shot Rewards Hit 0.95 Correlation Daily
Feb 24, 2026 74% of Agent Coordination May Be Wasted Effort Daily
Feb 23, 2026 Model Folding Beats Pruning, XR Gets Hand-Level Control Daily
Feb 22, 2026 Adaptive DiT Patches Hit 3x Speedup, Mamba Improves by Subtraction Daily
Feb 21, 2026 Agents Score Higher but Fail the Same Way Daily
Feb 20, 2026 Example Pairs Replace Prompts, Agents Play Favorites Daily
Feb 19, 2026 Spectral Decay Recovers 7% Accuracy in W4A4 Quantization Daily
Feb 18, 2026 Binary Tokens Make Image Gen 30x Faster, RL Training Learns to Reflect Daily
Feb 17, 2026 Online RL Cracks Web Agents, Reward Models Learn to Look Backward Daily
Feb 16, 2026 Vertical AI Is Winning: Medical, Robotics, and Science Agents Daily
Feb 15, 2026 Running Out of RL Training Data? Just Combine the Easy Problems Daily
Feb 14, 2026 11B Active Parameters Hit Frontier-Level Agent Intelligence Daily
Feb 13, 2026 AI Solves Real Open Math Problems, World Models Everywhere Daily
Feb 12, 2026 Text Diffusion Hits Practical Speed, RL Spreads Everywhere Daily
Feb 11, 2026 Agent Bottlenecks Are Shifting From Models to Systems Daily
Feb 10, 2026 LinkedIn Ships LLM-Powered Search Ranking at Scale Daily
Feb 9, 2026 Medical LLMs Should Ask Questions, Not Just Answer Them Daily
Feb 8, 2026 Diffusion Drafting Hits 6x Speedup, 14B Beats Claude at Kernels Daily
Feb 7, 2026 Trillion-Parameter Multimodal, 4B Agents Match 671B, PPO Exposed Daily
Feb 6, 2026 256 Tokens Match Full Attention, Agents That Build Agents Daily
Feb 5, 2026 Kimi K2.5 Open-Sources Agent Swarm, CoT Plans Only 2-3 Steps Ahead Daily
Feb 4, 2026 Better SFT Makes Worse RL, Distillation Waste, Reward Circuits Daily
Feb 3, 2026 Zero-Cost Data Mix Search, Guided RLVR, Selective SFT Daily
Feb 2, 2026 Unlimited RLVR Data From Web Text, FP4 Pretraining Matches BF16 Daily
Feb 1, 2026 Open-Source Deep Research Beats GPT-5, Embedding Scaling Outshines Experts Daily