Finding Potential Biomarkers And Therapeutic Drugs With Coronavirus Datasets

We recently launched the Viromics Playground, an open-access online version of the Omics Playground platform that includes over 200 viral infection transcriptomes. I thought it might be a good idea to give it a spin with some of the publicly available coronavirus datasets that have been pre-loaded. In particular, I wanted to see if I could quickly gain some insights via the platform on potential biomarkers of early infection and also a list of drugs with therapeutic potential.

I started by looking at a Middle Eastern coronavirus dataset from a published study, mainly due to the presence of several early infection time points in the data that could provide useful biomarkers. This species is related to, but distinct from, the Middle Eastern Respiratory Syndrome (MERS) coronavirus.

The dataset consists of 32 RNA-seq samples collected up to 24 hours post-infection in Calu-3 cells (a human lung cancer cell line) and their respective, uninfected controls. As observed in the paper, viral infection induced little differential gene expression in host cells prior to 18 h post-infection. I generated a PCA plot, which indicated that only the late infection (EMC_18h and EMC_24h) samples clustered separately (Figure 1).

However, a closer look via a clustered heatmap, besides confirming the structure observed in the PCA plot, revealed the early expression of a few genes at 12h and, in the case of SSX2, even as early as 7 h (Figure 2).

Clustering of the samples using a PCA plot.
Figure 1. Clustering of the samples using a PCA plot. Uninfected (Mock) samples form a cluster (in blue) with infected samples (EMC) collected before 18h post-infection. Samples collected 18h and 24h post-infection from a separate cluster (in red).
Omics Playground clustered heatmap showing the clear distinction between samples collected 18h and 24h after infection and the rest of the samples.
Figure 2. A clustered heatmap shows the clear distinction between samples collected 18h and 24h after infection and the rest of the samples. However, genes expressed earlier on in the infection can be observed. SSX2 (highlighted in a black box), in particular, is strongly expressed as soon as 7h post-infection.

I could quickly confirm this by looking at the SSX2 read counts across sample groups, with low expression in the control groups and the 3h infection group (Figure 3).

Average read counts for gene SSX2 by sample groups calculated over 2 or 3 samples.
Figure 3. Average read counts for gene SSX2 by sample groups calculated over 2 or 3 samples. Infected samples after 7h or more (in the red boxes) display significantly increased gene expression.

SSX2 could thus be an early indicator of infection. To test that, there is a biomarker analysis module (under the “signature” tab) implemented in the platform. I selected “treatment” (i.e. whether a sample is infected or non-infected) as my target variable in order to discriminate between infection states.

An important caveat is that there are many possible decision trees, so each time the program is run, a different tree will be generated. However, in most of the trees I generated, SSX2 occupied the top-most node and appeared to distinguish late-infection samples from early infection (3h and 0h) and uninfected samples, as seen in one of the possible trees (Figure 4).

Optimizing the input samples treatment phenotype description (by distinguishing simultaneously both by infection status and post-infection time point) could improve reliability further.

Decision tree drawn by the biomarker analysis module in Omics Playground.
Figure 4. Decision tree drawn by the biomarker analysis module. The tree shows a correlation between SSX2 expression levels and distinction between late infection and early infection/uninfected samples.

In the original study, the authors had indicated various kinase inhibitors (such as SB-203580 and LY-294002) as negative regulators of genes overexpressed during early infection stages. I thus used the gene expression profiles of the 12h contrast between infected and uninfected samples to replicate their finding using the “Drug connectivity” module in the platform.

Among the drug modes of action that most inhibited the viral expression profile, several kinase inhibitors, such as p38 MAPK, PI3K, mTOR and MEK inhibitors appeared (Figure 5), consistent with the main findings of the original research. Furthermore, both SB-203580 and LY-29400 showed up as significant inhibitors of the viral profile at 12h (Figure 6).

Most enriched mode of actions of the drugs displaying positively and negatively correlated signatures
Figure 5. Most enriched mode of actions of the drugs displaying positively and negatively correlated signatures with the 12h infection phenotype.
GSEA plots showing the statistically significant correlation
Figure 6. GSEA plots showing the statistically significant correlation between the 12h gene expression signature and the SB-203580 and LY-29400 signatures.

I also wanted to identify some potential inhibitors against the late stage (24h) infection profile. Again, kinase inhibitors showed up prominently in my list of potential antiviral drugs. One of the prominent hits was trametinib (Figure 7), a MEK inhibitor used in cancer therapy.

Thanks to the activation matrix produced by the platforms, I could see that trametinib was potentially effective even against early stages (at least from 7h onwards) of the infection (Figure 8).

GSEA plot showing a statistically significant negative correlation between the 24h gene expression signature and the trametinib signature.
Figure 7. GSEA plot showing a statistically significant negative correlation between the 24h gene expression signature and the trametinib signature.
Activation Matrix of enriched drug signatures against the 24h contrast signature.
Figure 8. Activation Matrix of enriched drug signatures against the 24h contrast signature. Trametinib is highlighted in yellow. Negative correlation is seen in all time periods except 0h and 3h.

published study showed that trametinib was a strong inhibitor of MERS infections in vitro, confirming the antiviral potential of the drug.


Read the second part of the analysis.