Pretraining
Cross-source consensus on Pretraining from 1 sources and 7 claims.
1 sources · 7 claims
How it works
Preparation
Where it comes from
Highlighted claims
- The pretraining corpus contained 391,280,332 human single-cell measurements, 544 proteins, and 22 cell types. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- Protein symbols were standardized to canonical UniProt identifiers, and canonical amino-acid sequences were retrieved from UniProt. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- Raw expression matrices were transformed with log(1 + x) or arcsinh and min-max normalized to the continuous range from 0 to 10. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- A context-target attention mask forced conditional imputation from observed proteins by preventing target proteins from attending to other target proteins except themselves. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- Training jointly optimized expression self-decoder, global expression decoder, and joint decoder losses. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- Pretraining simulated targeted proteomics panel expansion by splitting each cell's proteins into visible context and target imputation sets. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
- The corpus combined antibody-based platforms with mass-spectrometry-based platforms to balance cell coverage and proteomic depth. — scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics