Published on June 3rd, 2024
Written by Axel Martinelli
⏱ 9 min read
In this biomarker data analysis tutorial, we will guide you through performing biomarker analysis using the Omics Playground platform. The dataset we will be using comes from an article by Logie et al. (2021), which investigates the therapeutic efficacy of withaferin A, a phytochemical kinase inhibitor, compared to the clinically approved BTK inhibitor ibrutinib.
For a general introduction to biomarker identification and the computational methods used for detection, read our blog post: “How to Find Biomarkers: Definition, Examples, and Computational Methods for Detection“.
The biomarker data analysis module in the Omics Playground platform is used for discovering biomarkers based on protein or gene expression levels.
The module, called “Find Biomarkers” in Omics Playground, can be found under Menu > Expression and consists of two main tabs:
When you first access the Biomarkers tab, no plots will be displayed automatically. To generate the plots, you’ll need to configure your settings and click ‘Compute’.
Begin by navigating to the settings bar on the right-hand side of your dashboard. Here, you’ll find three fields specific to the Biomarkers tab:
In the “Predicted target” field, you can select one of the phenotypes that you are interested in. In Figure 1, we selected the cross between phenotype and treatment.
You also have the option to filter by samples (Figure 2), allowing you to include only a subset of sample groups for your analysis instead of all of them..
After selecting the phenotypes, you can select whether you want to focus on all the genes or just a specific gene family in the “Feature set” field.
The ‘Feature set’ setting is set to ‘all’ by default, but you can restrict the calculations to a specific gene family or add a custom list of genes if you already have potential biomarkers that you want the platform to focus on. In this case, you can click on <custom> and copy and paste the list of gene acronyms into the dialogue box that appears on the platform (Figure 3).
Once you have specified your settings, click on the ‘Compute’ button. After the computation is completed, the platform will generate four plots in the “Feature Selection” tab from which you can start your analysis.
In the feature selection tab you will find the following four plots starting from bottom right (Figure 4):
The dashboard layout gives you the freedom to explore the data from any starting point you prefer. In this example, we’ll begin somewhat counterintuitively with the Decision Tree, located at the bottom right of the tab.
The Decision tree presents a classification solution based on the most probable biomarkers.
In this instance, the platform utilizes two genes, Heat Shock Protein A6 (HSPA6) and CDKN2A, to distinguish between four distinct phenotypic groups within the dataset.
These groups consist of treated and untreated samples, as well as susceptible and resistant samples (see Figure 5).
The Biomarker Expression plot, located in the top-right corner of your dashboard, displays expression levels across different phenotypic groups.
In our example (Figure 6), you can see eight box plots representing the most likely biomarkers. The top two biomarkers correspond to the HSPA6 and CDKN2A genes, which were used to generate the decision tree shown in Figure 5.
Right next to the Decision Tree plot, you will find the Heatmap, which displays the most prominent potential biomarkers and their expression levels across all samples, categorized by phenotypic group.
In this view, you can see the two genes that were used to generate the decision tree, highlighted by asterisks on the platform (see Figure 7).
Finally, we have the Variable importance plot which is the most important from a bioinformatics point of view (Figure 8).
This plot combines the results of six different machine learning algorithms and two other statistical tests to produce cumulative scores of variable importance.
The algorithms include LASSO, elastic nets, random forests, and extreme gradient boosting. To learn more about the methods used you can consult the Biomarkers module documentation.
In our example, you can see that the two most prominent biomarkers based on the combination of different approaches are HSPA6 and CDKN2A. Specifically, HSPA6 was correctly identified as a biomarker by all eight approaches.
The second tab in our biomarkers analysis module is the Feature-set Ranking tab (Figure 9).
In this tab, genes are categorized by gene families, and the platform assesses their discriminatory power in distinguishing various phenotypic groups within each phenotype. This allows us to determine which feature set (or gene family) best explains the variance in the data.
Users can choose between three different methods for the calculation of the plot:
To choose your preferred method, you can click on the menu at the top of the plot and select the one you’re most interested in (Figure 10).
For this example, we used the correlation method and you can see that Heat shock proteins rank highest (Figure 11). This is largely due to their effectiveness in discriminating between treatment groups and the intersection of phenotype and treatment.
You can also see that there are quite some prominent scores for the cluster and cell cycle phenotypes, which are automatically generated by the platform for every uploaded dataset.
Crucially, we can see that heat shock proteins do not have strong discriminatory power for the glucocorticoid-resistant phenotype. Therefore, if you are more interested in these phenotypes, you might consider looking towards nuclear receptors or chemokines, for example.
If you’d like to see a full analysis of this dataset, you can read our re-analysis.
Upload your dataset and start exploring biomarker analysis with Omics Playground today!

Axel Martinelli’s academic background is in molecular biology and parasitology. He earned a Ph.D. on the genetics of strain-specific immunity against malaria infections and a master’s degree in bioinformatics with specialization in the analysis of omics data. During his postdoctoral career, he worked on genomics and transcriptomics studies and is currently the head of biology at Bigomics Analytics.
