Published on October 2, 2025
Last updated on February 23, 2026
by Antonino Zito 10 min read
Weighted gene co-expression network analysis (WGCNA) is a powerful all-in-one analysis method that allows biologists to understand the transcriptome-wide relationships of all genes in a system rather than each gene in isolation.
With WGCNA, researchers can identify clusters of genes (called modules) that share correlated expression patterns and explore how these clusters relate to one another. Importantly, WGCNA also provides data on the association between modules and external traits, such as recorded sample phenotypes. Identification of gene correlation networks has high biological relevance as genes within the same module could share regulatory mechanisms and be functionally related within a molecular pathway at the cellular and inter-cellular level (1-4).
Ultimately, WGCNA could inform on candidate biomarkers and druggable features for therapeutics. Although WGCNA has mostly been applied to transcriptomic data, its principles are suited to other omics, such as methylation data.
So, how does WGCNA work in practice? What insights can you expect from the approach in your next RNA-seq study, and what are its limitations?
WGCNA is a systems biology approach that researchers use to analyze complex data patterns in large numbers of samples (1). WGCNA is split into four main sequential analytical components:
WGCNA determines these outcomes by pairwise correlations between genes or modules in a guilty-by-association approach, where information about a gene is gained from its close neighbors in the network. However, there are many options and outputs, which could seem overwhelming to biologists with limited knowledge of the analyses performed and the plots generated.
Here, we break down the WGCNA method, the plots, and their interpretation into bite-size pieces for biologists with limited bioinformatic expertise.
Typically, WGCNA begins with a matrix of data that features the gene expression of each sample.
The method then measures pairwise correlations between genes across all samples.
The correlation score of each gene pair indicates the similarity of their expression pattern and could suggest their potential functional relationship.
The ‘weighted’ aspect of WGCNA aims to amplify the differences between strong and weak correlations by raising the correlation to a power defined by the user (don’t worry, the Omics Playground takes care of selecting the most appropriate power value for the user!). A high correlation indicates the genes are strongly connected, whereas a low correlation suggests a weak connection.
These magnified weighted correlation values then make it easier to identify groups of genes with similar behaviors in the subsequent step.
The user must then select the type of network required don’t worry if that sounds technical, platforms like Omics Playground automatically select the most appropriate power value for you.
The result is a network plot where genes are depicted by circles (known as nodes), and the strength of the weighted correlation coefficient is shown by the thickness of a line (or edge) connecting two genes.
In the plot below, the thick green line between SVOP and AMER3 indicates a strong correlation and, therefore, potential association between these two genes (Fig. 1).
The method then measures pairwise correlations between genes across all samples.
Next, WGCNA uses the network’s weighted correlation coefficient information to place genes exhibiting significantly similar expression profiles into groups called modules.
If genes have similar correlations with many shared neighbors in the network or have a large overlap of their network neighbors, the genes likely have similar expression patterns and can be grouped into the same module.
To determine modules, hierarchical clustering is performed on the gene correlation network data. A dendrogram is generated where each branch identifies a specific module (Fig. 2).
Methods like dynamic tree cut can be employed to determine discrete modules containing genes with similar expression patterns. Each module is assigned a distinct ID and color.
Be cautious when setting parameters
The way you “cut” the dendrogram influences the size and number of modules and, if done incorrectly, the clustering can become misleading and reduce biological accuracy.
These modules serve as the foundation for the next step, where WGCNA examines how gene clusters correlate with phenotypic traits.
Once modules are defined using the dendrogram, the output must be simplified to one value per module, called the module eigengene. The eigengene is the first component from a principal component analysis and represents the overall module expression.
As the module eigengene characterizes each module as a singular entity, it enables us to perform correlation analysis between modules to find those with similar expression behaviors or to determine how each module correlates with phenotypes.
For instance, our example below shows that the module eigengenes of ME1 and ME4 are highly correlated, suggesting their biological function or sample types of origin might be related (Fig. 3A).
To determine whether these two modules do have similar biological roles, we can next measure the degree to which each module’s eigengene correlates to different patient traits, sample types, or disease outcomes. These biological variables could include a patient’s age, gender, or weight, outcomes like remission or patient death, or whether samples originate from healthy or disease patients or from different organs or tumor locations. Researchers have used this approach to identify key modules in many diseases, such as glioblastoma, breast, and colorectal cancer (2, 3, 4).
In our example, ME1 and ME4 both highly correlate to healthy samples, whereas ME2 and ME3 are highly correlated with glioblastoma samples (Fig. 3B). This suggests that the genes contained in these modules could play a role in glioblastoma.
We can get an overview of this possibility by performing gene ontology analysis to determine modules enriched for genes associated with particular pathways or functions.
But, from this, we don’t yet know which genes within each module might be the most important.
Finally, once we have identified modules of interest, we can delve deeper into each module to find genes that might be key factors for a particular trait or could influence other genes in that module. Each module may contain many genes; it is essential to identify so-called ‘hub genes’ that can be ideal candidates for further study.
Hub genes are identified as the most highly connected genes within a module and, expectedly, the most strongly correlated with the phenotype of interest. The expression of a gene is also used to calculate the ‘module membership, which measures the degree to which a gene’s expression profile with a particular module within the expression network. Module membership is therefore a useful tool for prioritizing genes for further study.
If the correlation is high, the gene is likely representative of the overall expression of the module as a whole and is well connected in the network. Similarly, the high correlation of this gene to the trait of interest further strengthens its likelihood as an important driver in that module.
While it is a powerful approach, many parameters in WGCNA can present problems to the user if not applied correctly.
For instance, before generating the correlation networks, users must choose, among many other options:
Network type (signed vs. unsigned)
Correlation method (Pearson, Spearman, or others)
Soft-thresholding power values to weight the correlations
Cut-offs for defining modules
The wide swathe of options and parameters needed to conduct an end-to-end WGCNA could make the analyses highly error-prone. In fact, selecting an inappropriate method, parameter, or threshold for the type or spread of your data could lead to misinterpretation of correlations where outliers aren’t treated correctly, networks that aren’t biologically realistic, and ultimately inaccurate conclusions that could hinder future research.
Another major problem many biologists face is that WGCNA is available predominantly for those with coding knowledge. For biologists unfamiliar with programming languages, this barrier can often seem insurmountable, regardless of the vast biological insights possible with WGCNA.
To overcome this hurdle, our Omics Playground is designed as a point-and-click entry point that allows scientists with no knowledge of coding to explore all the features and parameters of WGCNA in an interactive and efficient way.
To view the WGCNA module in Omics Playground, you need to select the option in the computation settings when you first upload your dataset. Below you can find a step-by-step guide to enable the module and access it when your dataset is loaded.
If you’re new to Omics Playground you can easily access the platform by registering for a trial account. The platform is designed to be as user-friendly as possible. However, to ensure a successful first upload, we recommend reading the guidelines on data preparation before proceeding.
If you already have an account, simply log in using your credentials.
When you upload a new dataset, you will be prompted to follow these steps:
Once you’ve ensured that all your options are selected, click ‘Compute’ and wait for the dataset to be ready.
For more information on how to upload your data to Omics Playground, feel free to check our step-by-step uploading guide.
When your dataset is ready you’ll receive both an email and in-app notification. Once loaded, you can find the WGCNA module under Menu > SystemsBio > WGCNA (see Figure 5).
You can then select different methods, parameters, and thresholds to fully explore your data using the settings menu on the right-hand side of your dashboard (Figure 6).
Now that you can easily access the WGCNA module, you can start exploring the complex relationships within your dataset and uncovering new biological insights.
This is where the magic happens – genes are grouped into modules.
(a) Gene dendrogram and modules – Shows a tree-like diagram where similar genes are clustered together. The colored bar at the bottom shows which module each gene belongs to (turquoise, blue, brown, etc.).
(b) Scale independence and mean connectivity – Technical plots that help determine if the analysis parameters are appropriate. You don’t need to interpret these in detail; they’re mainly for quality control.
(c) TOM heatmap – A red-orange colored heatmap showing how strongly genes are connected to each other. Darker red squares indicate genes that are highly correlated.
(d) Feature UMAP – A scatter plot where each dot represents a gene, colored by its module assignment. Genes in the same module cluster together spatially.
(e) Module size – A bar chart showing how many genes are in each module. This helps you understand the scale of each module.
An “eigengene” represents the average expression pattern of all genes in a module. It’s like a summary or representative profile for that module.
(a) Module-Trait relationships – Shows which modules are associated with your experimental conditions (e.g., disease vs. control, different treatments). Modules with strong correlations to your traits are the most biologically relevant.
(b) Sample dendrogram + eigengenes – Shows how your samples cluster together, with module eigengene patterns displayed as heatmaps below.
(c) Sample dendrogram + traits – Similar to (b) but showing your experimental traits instead of eigengenes.
(d) Eigengene correlation heatmap – Shows which module eigengenes are correlated with each other. This can reveal higher-order relationships between modules.
(e) Eigengene dendrogram – Shows how module eigengenes cluster together based on their similarity.
(f) Module graph – A network visualization showing relationships between modules.
This tab lets you dive deep into individual modules to identify “hub genes” – the most important genes driving each module’s pattern.
(a) Summary – Basic information about the selected module.
(b) Trait correlation – Shows how strongly the selected module’s eigengene correlates with different experimental traits. Higher bars indicate stronger associations.
(c) Circle network of hub genes – A network diagram showing the most connected genes (hub genes) in the module. These are often the most biologically important genes.
(d) Significance table – Lists genes in the module ranked by their importance scores. Higher scores indicate genes more central to the module.
(e) Gene significance – Small scatter plots showing the relationship between module membership and gene-trait correlation for individual genes.
This tab tells you what biological functions or pathways are enriched in each module, helping you understand what the module actually does.
(a) Geneset heatmap – Shows which gene sets (pathways, GO terms, etc.) are enriched in the module. Red indicates strong enrichment.
(b) Gene heatmap – Shows expression patterns of genes in the most enriched gene sets.
(c) Enrichment scores – A table listing the most significantly enriched biological terms, ranked by score. This is where you find out if your module is involved in “immune response,” “cell cycle,” etc
(d) Gene frequency – A bar chart showing which genes appear most frequently in the enriched gene sets.
1. Start with the WGCNA tab – Look at how many modules were detected and their sizes. Modules with 50-500 genes are usually most interpretable.
2. Go to Eigengenes tab – Check the Module-Trait relationships heatmap. Identify modules that show strong correlations (positive or negative) with your experimental conditions.
3. Select interesting modules – Focus on modules that correlate with your traits of interest (from step 2).
4. Go to Modules tab – For each interesting module, examine the hub genes in the network and check the significance table to identify key driver genes.
5. Go to Enrichment tab – Look at what biological functions are enriched in your module. This tells you what biological process this gene group is involved in.
6. Biological interpretation – Combine the information: “Module X (e.g., MEblue) contains 200 genes that are upregulated in disease samples. The hub genes include IFIT2, IFIT3, and HERC5. This module is enriched for interferon response and immune activation.”
In addition to WGCNA, Omics Playground offers a wide range of other analytical tools and modules, allowing you to continue your research and dive deeper into your data. Start exploring today and discover the many ways Omics Playground can enhance your biological discoveries.

Antonino is a senior bioinformatics engineer at BigOmics with a strong background in bioinformatics and biostatistics. With a PhD in genetics and bioinformatics and an MSc in biotechnology, he has made significant contributions to computational analysis in numerous projects during his previous research at Harvard Medical School and King’s College London.
