Open Omni Hits Flagship Scale, Self-Judge Breaks, Reasoning Leaks Forgotten Facts

15 selected from 200 papers

Also Worth Noting