Tile Quantization
Cross-source consensus on Tile Quantization from 1 sources and 4 claims.
1 sources · 4 claims
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- Tile-quantization correction requires an NCU profiling pass, which adds overhead and cannot be used continuously. — Instant GPU Efficiency Visibility at Fleet Scale
- For well-aligned matrices of size at least 4,096, observed tile-quantization overhead was about 9% at maximum, with means of 2–3%. — Instant GPU Efficiency Visibility at Fleet Scale
- Small matrix sizes can incur tile-quantization overhead above 50% because of severe tile padding. — Instant GPU Efficiency Visibility at Fleet Scale
- TF32 is an outlier because cuBLAS selects XMMA and CUTLASS kernels that produce higher overhead at small sizes and converge more slowly. — Instant GPU Efficiency Visibility at Fleet Scale