Benchmarking Methodology

Cross-source consensus on Benchmarking Methodology from 1 sources and 5 claims.

1 sources · 5 claims

Preparation

The evaluation combines isolated attention-kernel benchmarks with complete pruned-ViT inference pipelines. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
Timings are measured on an NVIDIA A100-SXM4-40GB using PyTorch 2.8, Triton 3.4, and flash-attn v2.7. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
The primary isolated-kernel setting uses DeiT-Base with 12 attention heads and head dimension 64. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
Input tokens are real ImageNet features extracted through the first four DeiT layers before Threshold-L2 pruning is applied. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers
The study recommends benchmarking token pruning against ragged variable-length execution and separately reporting attention and MLP latency. — Dispatch-Aware Ragged Attention for Pruned Vision Transformers