Diffusion Transformers
Cross-source consensus on Diffusion Transformers from 1 sources and 6 claims.
1 sources · 6 claims
Uses
How it works
Preparation
Risks & contraindications
Comparisons
Other
Highlighted claims
- The primary diffusion model used EDM continuous-time Gaussian diffusion with a Diffusion Transformer. — The two clocks and the innovation window: When and how generative models learn rules
- DiT-mini had about 16.5 million parameters. — The two clocks and the innovation window: When and how generative models learn rules
- DiT sampling used a deterministic second-order Heun sampler with 35 steps. — The two clocks and the innovation window: When and how generative models learn rules
- Per-sample diffusion transitions from rule-violating to rule-valid states were sharp and synchronized across seeds. — The two clocks and the innovation window: When and how generative models learn rules
- DiT G=6 rule learning was sensitive to learning rate. — The two clocks and the innovation window: When and how generative models learn rules
- Diffusion noise is interpreted as data augmentation that slows both learning and memorization compared with GPT. — The two clocks and the innovation window: When and how generative models learn rules