Benchmarking Methodology
Cross-source consensus on Benchmarking Methodology from 1 sources and 5 claims.
1 sources · 5 claims
Preparation
Evidence quality
Highlighted claims
- The evaluation combines isolated attention-kernel benchmarks with complete pruned-ViT inference pipelines. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
- Timings are measured on an NVIDIA A100-SXM4-40GB using PyTorch 2.8, Triton 3.4, and flash-attn v2.7. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
- The primary isolated-kernel setting uses DeiT-Base with 12 attention heads and head dimension 64. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
- Input tokens are real ImageNet features extracted through the first four DeiT layers before Threshold-L2 pruning is applied. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
- The study recommends benchmarking token pruning against ragged variable-length execution and separately reporting attention and MLP latency. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers