Written by Ivo Kwee
⏱ 5 min read
Multi-omics biomarker identification is crucial for advancing biomedical research and personalized medicine. Computational genomic approaches are increasingly used to screen large biological datasets and identify features capable of classifying or predicting phenotypes.
While traditionally focused on single-omics data, integration of multi-omics data enhances biomarker selection by capturing additional layers of biological variation. Matrix factorization methods have emerged as a powerful tool for multi-omics analysis, as they can learn latent factors that capture significant heterogeneity across different data types.
However, each of the current multi-omics factorization algorithms presents its own strengths and weaknesses. Distinct approaches may result into different biomarker sets, therefore causing loss of potentially valuable information. Ultimately, reconciling different biomarker sets identified by distinct methods is difficult and error-prone.
In this blog post we’ll address these questions by using a combinatorial approach to compare multi-omics factorization methods for robust biomarker identification.
We used a TCGA, multi-omics breast cancer dataset of 150 samples comprising of transcriptomics, proteomics, and microRNA profiles.
We used this dataset to:
All methods were able to predict a set of biomarkers. However, as anticipated, non overlapping biomarker features between methods were often observed.
It remains challenging to determine the optimal criteria for selecting the most appropriate factorization approach. For example, distinct data types or data modalities may require tailored approaches.
We compared the methods by correlating their factor loading (weights) and clustering them in a heatmap (see Figure 1).
Figure 1. Clustered correlation heatmap of multi-omics factorization methods. The correlation is measured between the ranked weights of the maximum correlated factor with the phenotype Her2. Methods that are clustered together have similar factor weights.
We found that PCA, MOFA, NMF2 are more similar to each other compared to other methods. This can be explained by algorithm similarity. For instance, both PCA and MOFA attempt to explain the maximum variance into a small set of components or factors created as an approximated linear combination of the original variables from each data modality.
Also, as expected, canonical correlation analysis (CCA) methods, including SGCCA, RGCCA, SGCCDA were highly correlated with each other. We also found that DIABLO, a widely used supervised learning factorization method, was highly correlated with SGCCDA.
Correlation analysis also revealed significant divergence between methods. For instance, both PCA and MOFA were lowly correlated with MCIA (multiple co-inertia analysis). We computed a variable importance score for each method. An aggregated score is then calculated as the cumulative rank of the variable importances of the different algorithms (see Figure 2).
To define a robust set of biomarkers, we selected the best predictive features as those with the highest cumulative ranks.
Finally, the factor-trait correlation matrices of the different methods showed how the different methods differ substantially in their support or effective dimensionality (see Figure 3).
Combining biomarker scores from multiple multi-omics integration methods delivers more robust biomarkers bypassing the risk of information loss from single methods.
Omics Playground’s new multi-omics beta features combine RNA, protein expression, metabolomics, and integrated pathways, empowering you to transform complex multi-omics data into actionable insights.
Using multiple methods, including MOFA, MixOmics and Deep Learning, Omics Playground ensures a comprehensive and robust analysis of your multi-omics datasets.
The multi-omics features are currently available for testing. All you have to do is log in to your account or sign up for a trial. Once in, select “Omics Playground v4 (beta)” and start uploading your data for multi-omics data analysis!
Work in biotech or pharma? Contact us here to learn more.

Ivo Kwee holds a BSc degree in Engineering Physics, an MEng in Applied Physics and a PhD in Medical Physics. He has over 16 years of experience in bioinformatics and is currently CTO and co-founder of BigOmics Analytics, where he contributes to the mission of creating the best self-service analytics platform that enables life scientists to analyze their omics data.
