Diffusion Transformers

Cross-source consensus on Diffusion Transformers from 1 sources and 6 claims.

1 sources · 6 claims

Uses

The primary diffusion model used EDM continuous-time Gaussian diffusion with a Diffusion Transformer. — The two clocks and the innovation window: When and how generative models learn rules
DiT-mini had about 16.5 million parameters. — The two clocks and the innovation window: When and how generative models learn rules
DiT sampling used a deterministic second-order Heun sampler with 35 steps. — The two clocks and the innovation window: When and how generative models learn rules
Per-sample diffusion transitions from rule-violating to rule-valid states were sharp and synchronized across seeds. — The two clocks and the innovation window: When and how generative models learn rules
DiT G=6 rule learning was sensitive to learning rate. — The two clocks and the innovation window: When and how generative models learn rules
Diffusion noise is interpreted as data augmentation that slows both learning and memorization compared with GPT. — The two clocks and the innovation window: When and how generative models learn rules