Batch Size

Cross-source consensus on Batch Size from 1 sources and 4 claims.

1 sources · 4 claims

Benefits

Batch size affected energy per token more than DVFS or architecture choice in the experiments. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
Increasing batch size from 1 to 32 reduced energy per token by more than 20x through weight-loading amortization. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
At batch size 32 and sequence length 4096, optimal clocks and energy savings varied materially by architecture. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
The batch-size sweep is used to argue that even high request concurrency does not make power caps effective for decode on the tested GPU. — The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures