Batch Size
Cross-source consensus on Batch Size from 1 sources and 4 claims.
1 sources · 4 claims
Benefits
Comparisons
Evidence quality
Highlighted claims
- Batch size affected energy per token more than DVFS or architecture choice in the experiments. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
- Increasing batch size from 1 to 32 reduced energy per token by more than 20x through weight-loading amortization. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
- At batch size 32 and sequence length 4096, optimal clocks and energy savings varied materially by architecture. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
- The batch-size sweep is used to argue that even high request concurrency does not make power caps effective for decode on the tested GPU. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures