Memorization

Cross-source consensus on Memorization from 1 sources and 6 claims.

1 sources · 6 claims

How it works

τmem is defined as the first step where sample memorization reliably exceeds 0.1. — The two clocks and the innovation window: When and how generative models learn rules
Memorization time is approximately invariant to parity group size and scales nearly linearly with dataset size. — The two clocks and the innovation window: When and how generative models learn rules
Memorization corresponds to expansion of basins around actual training samples into nearby held-out valid regions. — The two clocks and the innovation window: When and how generative models learn rules
Increasing model capacity accelerates memorization. — The two clocks and the innovation window: When and how generative models learn rules
Around memorization onset, training loss separates from held-out valid loss. — The two clocks and the innovation window: When and how generative models learn rules
Memorization onset depends on gradient updates per example in the reported setup. — The two clocks and the innovation window: When and how generative models learn rules