Pretraining

Cross-source consensus on Pretraining from 1 sources and 7 claims.

1 sources · 7 claims

How it works

Preparation

Where it comes from

Highlighted claims

The pretraining corpus contained 391,280,332 human single-cell measurements, 544 proteins, and 22 cell types. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
Protein symbols were standardized to canonical UniProt identifiers, and canonical amino-acid sequences were retrieved from UniProt. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
Raw expression matrices were transformed with log(1 + x) or arcsinh and min-max normalized to the continuous range from 0 to 10. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
A context-target attention mask forced conditional imputation from observed proteins by preventing target proteins from attending to other target proteins except themselves. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
Training jointly optimized expression self-decoder, global expression decoder, and joint decoder losses. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
Pretraining simulated targeted proteomics panel expansion by splitting each cell's proteins into visible context and target imputation sets. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
The corpus combined antibody-based platforms with mass-spectrometry-based platforms to balance cell coverage and proteomic depth. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics