Human Evaluation Results
Cross-source consensus on Human Evaluation Results from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Comparisons
Highlighted claims
- Multimodal AMIE outperformed PCPs in top-k diagnostic accuracy for k from 1 through 10. — Advancing conversational diagnostic AI with multimodal reasoning
- Multimodal AMIE was rated similar to or higher than PCPs across history-taking, diagnostic accuracy, management reasoning, communication, empathy, and multimodal handling. — Advancing conversational diagnostic AI with multimodal reasoning
- Specialists rated AMIE higher overall across diagnosis, management, history-taking, multimodal interpretation, artifact-grounded reasoning, and artifact-related questions. — Advancing conversational diagnostic AI with multimodal reasoning
- Low artifact quality reduced top-3 diagnostic accuracy for both AMIE and PCPs, but AMIE had a smaller drop. — Advancing conversational diagnostic AI with multimodal reasoning
- Patient-actors rated AMIE similar to or higher than PCPs on patient-centric consultation metrics. — Advancing conversational diagnostic AI with multimodal reasoning