How to Perform Differential Gene Expression Analysis

Steps in analyses of gene expression data and popular differential gene expression analysis methods

Published on January 30th, 2024 
by Antonino Zito, PhD

 6 min read

Differential Gene Expression Analysis Tab in Omics Playground

What is Differential Gene Expression Analysis?

Differential Gene Expression Analysis aims to detect features (i.e., genes) exhibiting substantial differences in the levels of gene expression between conditions. Differential gene expression testing is a key component of the discovery process in biological research. 

It provides scientists with a powerful tool to gain valuable information on possible molecular factors and mechanisms underlying biology in health and disease.

At BigOmics we employ differential gene expression testing at a large scale to identify putative disease-associated features and genes perturbed upon drug treatments. In this blog post, we provide a basic overview of differential gene expression methods as an integral part of a common bioinformatic pipeline [Figure 1].

Gene Expression Data Analysis Steps

Gene Expression Data Analysis Steps
Figure 1. Example of a common bioinformatic workflow. Typically, analyses of gene expression data involve multiple steps, including quality control and normalization, statistical analyses to identify genes differentially expressed between groups, and downstream analyses of validation and functional testings. All data analysis steps can be conveniently conducted using functionalities of the R language for statistical computing.

In Figure 1 above, we show the typical steps in analyses of gene expression data:

1. Data collection. Microarrays or RNA-seq are the most employed techniques to measure gene expression levels of thousands of genes per sample.

2. Data processing. Raw gene expression data usually need to be appropriately cleaned to remove noise, technical and unwanted biological effects. This also involves normalization, a tailored mathematical approach to enable valid comparison between groups. Curious and want to hear more about normalization and why it’s so important? Read our tech blog dedicated on normalization.

3. Differential gene expression. Differentially expressed genes (DEGs) are genes exhibiting significant changes in expression between experimental groups. DEGs can be identified with distinct statistical methods that assess both magnitude and statistical significance of the difference between groups (i.e., ‘fold-change’ and ‘p-value’, respectively) [Fig.2]

Example comparison of gene expression levels between two groups
Figure 2. Example comparison of gene expression levels between two groups. A): Expression levels in Group 1 and Group 2 are highly similar; B): Expression levels are slightly higher in Group 2 than Group 1, yielding moderate Fold-change and P-value<0.05; C): Expression levels are substantially higher in Group 2 than Group 1, yielding high Fold-change and highly significant P-value; D); Expression levels are slightly higher in Group 2 than Group 1 consistently in a large number of samples, yielding moderate Fold-change but highly significant P-value. Red line denotes the average gene expression level.

4.  Functional analyses of discovery set. Differential Gene Expression analyses may result in a list of DEGs which collectively define the ‘discovery set’.  Researchers carefully search in the discovery set for genes known to be associated with the phenotype or condition of interest, as well as for potentially new associations. New associations would ideally need validation with molecular biology assays (e.g., PCR). Regardless, functional enrichment analyses are often conducted to gain knowledge on the possible biological roles of the discovery set at the cellular level. For instance, it‘s interesting to assess whether genes are enriched of known biological functions and pathways, or are distinctively enriched for disease-associated features.

Popular Differential Gene Expression Analyses Analytical Methods

With the advance of Bioinformatics and Computational Biology, numerous methods for differential gene expression testing have been developed. Each method has its own strenghts and weaknesses, and its applicability depends on data type, size of the dataset, availability of replicates, and type of test needed to address the proposed questions.

At BigOmics, we are very careful on how to properly conduct differential gene expression analysis. Below we list highly popular methods for differential gene expression testing, available in Omics Playground.

  • T-test: It’s the simplest statistics that can be used to compare two groups based on their average gene expression levels.
  • DESeq2: It employs negative binomial generalized linear models to assess variability in gene expression profiles and significant changes between groups using likelihood ratio or Wald test.
  • EdgeR: It employs negative binomial models with estimation of dispersion parameters, and assesses differential expression using likelihood ratio or quasi-likelihood F-test.
  • Limma: It employs ordinary linear models with T and F test to measure gene expression differences between groups.
  • Limma-voom: It calculates precision weights upon normalization and also employs linear models to measure gene expression differences between groups.

Aside from these highly popular methods, numerous others have been developed. Researchers often evaluate different methods and select the appropriate one based on their needs.

Omics Playground is equipped with 9 distinct differential gene expression methods, covering the most disparate experimental conditions.  It‘s our priority to offer researchers of any background a vast range of choices to study in detail their data, in the fastest possible time, and without requiring any coding. Our differential gene expression workflow is paralleled with extensive visualizations including Volcano plots, Box and Bar plots, and Heatmaps, and functional enrichment testing of biological pathways.

Differential Gene Expression Analysis in Omics Playground

Video 1. Expression analysis options in Omics Playground.

As mentioned, Omics Playground is equipped with 9 differential gene expression methods. Users just need to upload their data and select which statistical methods to use [Figure 3].

Gene tests selection in Omics Playground
Figure 3. The red box contains the differential gene expression analysis algorithms available in Omics Playground.

The Omics Playground platform is capable of integrating results from multiple differential gene expression methods to provide researchers with greater robustness in the results.  For example, in a common differential gene expression analysis, it reports multiple statistics including (i) a ‘meta q-value’ per each gene from combination of the distinct DGE p-values; (ii) a ‘Star classification’ informing on how many of the chosen statistical methods identify a gene as dysregulated [Figure 4].


Researchers can refine the discovery set by adjusting statistical parameters and so can see how many significantly dysregulated genes are detected at the different thresholds. Researchers also have the power to select individual genes and visualize their profiles across experimental groups, and extract data as tables reporting on pathways and gene sets in which the gene is annotated [Figure 4].

Differential expression in Omics Playground (V3)
Figure 4. Omics Playground tab for Differential Gene Expression testing. Results from a common differential gene expression analysis in Omics Playground. Shown are Volcano and MA plots, bar plots for a selected gene, and tables with genes’ statistics and gene sets. Users are offered the option to select the contrasts to be analyzed as well as FDR and LogFC thresholds, as shown in the right-hand side.

Conclusion

Omics Playground makes Bioinformatics analysis accessible to everyone, regardless of their programming skills. It also provides support to Bioinformaticians looking to delegate more routine Omics data analysis to biologists, in a win-to-win scenario for both parties.

Perform differential gene expression analysis interactively with a free trial of Omics Playground

About the Author

Antonino Zito

Antonino is a senior bioinformatics engineer at BigOmics with a strong background in bioinformatics and biostatistics. With a PhD in genetics and bioinformatics and an MSc in biotechnology, he has made significant contributions to computational analysis in numerous projects during his previous research at Harvard Medical School and King’s College London.