Optimizer Choice Stretches Capacity Scaling 2.3x

Today's Overview

Three Classes of Physical 3D Assets Merge Into One Pipeline. PhysX-Omni puts rigid, deformable, and articulated objects into a single framework. Output assets carry physics properties and drop straight into simulators. Multi-pipeline maintenance cost for sim-to-real teams should fall.
Image Generation Is Moving From a Model Problem to an Agent Problem. GenEvolve models each generation as a trajectory and distills visual experience across tasks. It sidesteps the retune-everything-per-request pattern.
Optimizer Is the Ignored Scaling Axis. Same FFN width increment, swap the optimizer, and the effective-capacity scaling exponent jumps from 0.44 to 1.02. Sweep optimizer as a variable before estimating scaling laws.

Featured

01 Three Asset Types, One Generation Pipeline

Robots learning to grasp, push doors, or squeeze sponges in simulation need more than 3D models. The asset has to drop into a physics engine and run dynamics. The trouble is that each asset class had its own generation pipeline. Sim-to-real teams ran several in parallel, with data, models, and evals all kept apart.

PhysX-Omni puts rigid, deformable, and articulated bodies into a single framework. Generated assets carry physics properties out of the box: materials, absolute scale, articulated structure, affordance. The release also ships PhysXVerse (the dataset) and PhysX-Bench (six-axis evaluation).

One caveat. Unified frameworks rarely beat per-category SOTA. The more useful question for practitioners: does this really collapse three pipelines into one? And does "simulation-ready" mean "plug into a physics engine," or just "parameter labels filled in"? Usually there's an engineering gap in between. Worth tracking if your team is building embodied AI assets. Check PhysXVerse quality before any production decision.

Key takeaways: - Three asset types in one generation framework is a relief signal for embodied teams maintaining multiple pipelines. - Don't expect single-category SOTA. The value is pipeline consolidation and asset-scale operations. - "Simulation-ready" is fuzzy. Parameter completeness isn't the same as plug-and-play. Verify against the code and dataset.

Source: PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

02 Complex Image Demand Is Becoming an Agent Problem

The demand curve for image generation is moving from prompt-to-image toward prompt-to-workflow. Complex composition, specific styles, multi-step edits — the model alone doesn't carry these anymore. Agents orchestrating multiple tools and models do.

GenEvolve models each generation as a trajectory: gather evidence, pick reference images, call generation skills, compose them into a final prompt-reference program. The same request runs across many trajectories. Differences between the best and worst get abstracted into structured "visual experience," then distilled back into the student model as token-level supervision.

Compared with agent methods using only image-level scalar rewards, trajectory-level differential distillation gives a much finer signal. It's also a more believable source for the reported SOTA numbers. The "self-evolving" framing is hard to falsify from the abstract alone. The underlying idea is worth keeping in mind: accumulate trajectory experience across tasks instead of retuning the base model per request.

Key takeaways: - The next differentiation layer in image generation may be agent orchestration and tool composition, not the base model. - Trajectory-level differential distillation gives a finer signal than image-level reward. That's the right place to probe for where the SOTA numbers come from. - Teams building complex image products should audit which capabilities pay off better via agent orchestration than via more model training.

Source: GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

03 Same Loss, Different Capacity

Same Transformer, same FFN width increment, swap the optimizer, and the effective-capacity scaling exponent jumps from 0.44 to 1.02. That's 2.3x. The measurement is hard-rank on rare-token representations, the part of the distribution that's hardest to learn.

Stranger still: AdamW under long training catches up to Muon on perplexity, but the spectral structure is different. Equal loss is not equal representation capacity. The authors put this optimizer effect next to architectural interventions like attention-rank changes and positional encoding. Optimizer choice usually matters more.

If you're estimating your own scaling laws, sweep the optimizer as a variable. Otherwise the curve you fit may not transfer to production.

Key takeaways: - Sweep optimizer as a variable when estimating scaling laws. Don't default to AdamW. The curve may not transfer. - Equal perplexity is not equal representation capacity. Scaling experiments should also check spectral structure. - Optimizer impact on effective capacity can exceed attention-rank or positional-encoding tweaks.

Source: Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Optimizer Choice Stretches Capacity Scaling 2.3x

Also Worth Noting

Microsoft's Lens Matches 6B+ Models on 19.3% of the Training Compute. Image Gen3.8B T2I model. Recipe is distillation plus a redesigned pretraining flow. A training recipe small and mid-budget teams should copy. link

Mining Implicit Safety Signal From Ordinary Crowd Preference Data. SafetyICML paper. Skips dedicated safety annotation. Uses existing preference datasets as a source of safety-related implicit objectives. link

DualOptim+ Adds a Dual-Optimizer Structure for LLM Unlearning. TrainingBase state plus delta state keeps forget and retain optimization states separate. Eases retained-capability damage from unlearning. link

Expectation Consistency Loss Brings Calibration Into Covariate Shift. InterpretabilityOut-of-distribution deployment gets more reliable confidence. Meaningful for safety-sensitive launches. link

Today's Observation

Today's two HF Daily papers, PhysX-Omni and GenEvolve, look unrelated. One does 3D assets, the other image agents. Step back, and both teams are pushing the value boundary of "generation systems" outward. PhysX binds simulation-ready physics parameters to the 3D output, so the result drops straight into a physics engine downstream. GenEvolve wraps a tool-orchestration agent around the image generator, so generation stops being one model call.

Core generation quality is hitting diminishing returns. Two independent teams are moving differentiation from "how good does this look" to "how does the output plug into downstream workflow."

This doesn't say the core generator stops mattering. Below a quality threshold, neither downstream binding nor upstream agents save the product. Quality stays a necessary precondition. The signal for teams building generative products: if your base model quality is good enough, the engineering budget worth spending may not be on more generation quality. It may be at the two ends. Downstream: does output drop into real user workflows like physics engines, design tools, or video pipelines? Upstream: can an agent wrap multi-step generation needs into one end-to-end experience?

At the next roadmap review, put "plug into downstream workflow" and "build upstream agent orchestration" on the same table as "train a better base model." Compare investment versus payoff. Especially if you can feel base-quality improvements producing less user-visible lift.