Memorization
Cross-source consensus on Memorization from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Comparisons
Evidence quality
Other
Highlighted claims
- τmem is defined as the first step where sample memorization reliably exceeds 0.1. — The two clocks and the innovation window: When and how generative models learn rules
- Memorization time is approximately invariant to parity group size and scales nearly linearly with dataset size. — The two clocks and the innovation window: When and how generative models learn rules
- Memorization corresponds to expansion of basins around actual training samples into nearby held-out valid regions. — The two clocks and the innovation window: When and how generative models learn rules
- Increasing model capacity accelerates memorization. — The two clocks and the innovation window: When and how generative models learn rules
- Around memorization onset, training loss separates from held-out valid loss. — The two clocks and the innovation window: When and how generative models learn rules
- Memorization onset depends on gradient updates per example in the reported setup. — The two clocks and the innovation window: When and how generative models learn rules