Model Architecture

Cross-source consensus on Model Architecture from 1 sources and 5 claims.

1 sources · 5 claims

How it works

The default audio backbone was Whisper Small with mean temporal pooling. — Voice Biomarkers for Depression and Anxiety
The core architecture used frozen pretrained backbones with trainable LoRA adaptation modules. — Voice Biomarkers for Depression and Anxiety
Training used randomly selected 30-second speech segments because Whisper Small had a 30-second receptive field. — Voice Biomarkers for Depression and Anxiety
Whisper Small was the strongest audio backbone in the early architecture and dataset setting. — Voice Biomarkers for Depression and Anxiety
Broad ASR pretraining across many speakers supported downstream mental health classification. — Voice Biomarkers for Depression and Anxiety