N-vium
Cross-source consensus on N-vium from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Comparisons
Evidence quality
Highlighted claims
- N-vium formulates a decoder transformer as a trained mixture over depth exits rather than as intermediate approximations to the final layer. — N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation
- N-vium augments a decoder-only transformer with multiple exits positioned at equal layer-block junctions. — N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation
- A standard transformer is a special case of N-vium when all probability mass is placed on the final head. — N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation
- Routers and adapters add modest parameter overhead in the reported experiments. — N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation