How to Upload Your RNA-Seq and Proteomics Data to Omics Playground

Step-by-step tutorial

Last Updated on December 10th, 2024
Published on February 5th, 2024
⏱ 13 min read

Introduction

In this step-by-step guide, we will walk you through the process of uploading your omics data to the Omics Playground platform. Whether you are a new or experienced user, this guide will provide you with important details to keep in mind at every step of the process.

Learn the essential steps to upload your data to Omics Playground and access a wide range of analysis modules for your research.

What is Omics Playground?

Omics Playground is a cloud-based platform that allows researchers to analyze and visualize RNA-Seq, proteomics and metabolomics (beta) data interactively. The platform is designed to support both bioinformaticians and biologists, making it the ideal tool for individuals and teams who need to efficiently store, analyze, and share their omics datasets.

You can find more information about Omics Playground here: Omics Playground – Bioinformatics software.

How to upload your data in Omics Playground

Video 1. This video provides a quick walkthrough on how to upload data into the platform. Each part of the process shown here is explained in detail in the sections of the guide below.

Before starting the upload process, it is essential to prepare and format your input files correctly to ensure a smooth and error-free experience. Properly formatted files are critical for a successful upload. For this, you will need the following files:

Count/expression file: “counts.csv”
Samples file: “samples.csv”
Comparisons file: “comparisons.csv” (optional)

The last one is not mandatory as you can use the platform’s interface to interactively select your contrasts. We recommend preparing a separate comparisons file for datasets requiring complex intersections between phenotypes via a script rather than using a spreadsheet software.

To know more about how to prepare your data for Omics Playground you can check our data preparation video tutorial (10:33 min) and read our dedicated omics data preparation guide.

Once you have confirmed that your data is correctly formatted, you are ready to upload it to the platform. For this, simply log in to your account and click on ‘Upload New Data’ (Figure 1).

First, you’ll be asked to select the data type and organism for your analysis.

The platform currently supports several data types: proteomics, RNA-Seq, mRNA-Seq, scRNA-Seq, and metabolomics (beta).

For organisms, Omics Playground has recently expanded its support to include over 290 species, covering most animals, plants, fungi, and more. To check if your organism is supported, simply type its name into the search bar.

After selecting the data type and organism, a prompt will guide you through the data upload process (Figure 2).

At each step, you’ll have access to documentation for additional details, along with a chat feature where you can ask questions or request support as needed.

Here is a quick summary of the steps involved:

Upload your counts/abundance table.
Upload your samples.
Select your comparisons.
Select Quality Control/ Batch Correction (QC/BC) options.
Name your dataset and select any additional computation options, if needed.
Click ‘Compute’ and enjoy exploring your data.

Step 1: Upload your counts, abundance or concentration

At this stage, you can select your abundance, counts or concentration file (Figure 2) and the platform will check for any formatting errors (Figure 3).

Step 1 of the data upload process in Omics Playground. Upload of counts or abundance csv file. — Figure 2. Upload abundance, counts or concentration prompt in Omics Playground. At the bottom, you can find two buttons. 'Load Example Data' allows you to use an example dataset to go through the data upload process, enabling you to see how the process works. Be sure not to click 'Compute' if you do not want this data added to your dataset library. 'Read Documentation' links you to the platform's documentation, where you can learn more about Omics Playground and each of its analysis modules.

Please ensure that your data is correctly formatted and that no inputs are missing to avoid encountering issues. If you need to revise your file, you can simply click the ‘Cancel’ button and upload your corrected abundance/counts.

For further guidance on what errors to avoid, refer to our video guide and documentation. If you still encounter issues, contact our team via the support chat at the bottom right corner. We’ll help you out.

If your data passes the checks, as shown in Figure 3, you can proceed to the next step.

Step 2: Upload your samples

Similar to the previous step, you only need to select or drag and drop your samples file. This file will undergo the same checks as before. If you are satisfied with the results, click ‘Next’ to proceed.

Step 3: Create Comparisons

You can either upload a comparison file or create your comparisons interactively.

When you click on ‘Upload Comparisons’, you’ll be prompted to upload a file, similar to how you uploaded counts and samples. Unless you have a complex dataset, we strongly recommend preparing contrasts separately using this tab.

If you decide to upload a comparisons file, you’ll see your predefined comparisons populate at the bottom of the screen. You can then delete the unnecessary ones or add more if you need to.

If you choose to prepare comparisons using the platform’s interface, you can select your comparisons interactively through a simple click-and-drag approach or by using the ‘Auto-detect Comparisons’ button (see Video 1). You can also intersect different phenotypes by selecting two or more simultaneously in the “Phenotype” scroll-down menu on the top right of the screen.

Under the ‘Comparison name’ box, the platform suggests a name for your new comparison. You can edit the name by clicking on the box. When doing so, remember to keep the “_vs_” element in the comparison name!

Video 1. How to select comparisons using the platform’s interface. Drag and drop to create comparisons from scratch or click on ‘Auto-detect comparisons’.

Step 4: Select Quality Control/ Batch Correction (QC/BC) options.

Once you’ve selected your comparisons, you may choose additional quality control (QC) or batch correction (BC) options if needed (Figure 5). This step is optional, as the platform will automatically apply default settings based on best practices.

Quality Control and Batch Correction options available in Omics Playground

For advanced users, there is the option to customize the following QC/BC settings:

Missing Values Management: This setting is especially helpful for proteomics datasets. If missing values are detected, you’ll see them in the top left corner of the QC/BC board. You can then choose to skip imputation, treat zero values as “NA” or opt to impute missing values using the SVDimpute method. You can learn more about the latter in our guide on Imputation of Missing Values in Proteomics.
Normalization: If your data is already normalized, simply uncheck the ‘Normalize data’ box to skip this step. Otherwise, you can normalize the data using one of the available normalization methods:
- For proteomics data: maxMedian, maxSum, or reference.
- For RNA-Seq data: Counts per million (CPM), CPM+Quantile normalization, maxMedian, maxSum, Reference.
Outlier Removal: To automatically detect and remove outlier samples, check the ‘Remove outliers’ box. Detected outliers will be listed in the box at the bottom left of the QC/BC board.
Batch-Effect Correction: Batch effects can be removed by selecting the appropriate option. The methods currently available are the following: ComBat, Limma, Nearest-Pair Matching (NPmatch), Surrogate Variable Analysis (SVA) and Remove Unwanted Variation (RUV). More details on each are available in the platform documentation.

The available options will vary depending on the data type you selected during the upload process.

Default Quality Control and Batch Correction options applied in Omics Playground

If you choose not to adjust any QC/BC settings, the platform will apply the following defaults:

Proteomics data: The data will be normalized using maxMedian, missing values (if any) will be imputed with SVDimpute, and no outlier removal or batch correction will be applied.
RNA-Seq and other data types: The data will be normalized using CPM+Quantile normalization, with no outlier removal or batch correction applied.

Step 5: Name your dataset and select any additional computation options (if needed)

After you’ve uploaded the input files and selected your comparisons, you can move to the Compute step where you have to:

Name your dataset.
Give a short description of your dataset. You can use this space to document any specific settings you’ve selected during the data upload process, as shown in Video 2.

Make sure to double-check that the information you provide above is correct before you click Compute, as you will not be able to change it later.

Optionally, you can review the computation options that will be applied to your data. While the default options work well for the majority of datasets and casual users, you can choose additional computation options by clicking on ‘Computation options’. To do this, simply expand the section and select various options to customize your analysis (Video 2). Below, you will find a guide to the different options provided.

Video 2. Naming your dataset and selecting the computation options in Omics Playground. In this final step of the data upload process, you’ll be prompted to name your dataset and choose the appropriate organism and data type. Additionally, you can access various computational options and customize the default settings to suit your needs.

BigOmics Tip: For datasets uploaded on or after November 22, 2024, you can now conveniently view both the QC/BC settings and the Computation options selected during the upload process. Simply click on the loaded dataset name in the top bar of your dashboard, and a pop-up box will appear, displaying all the details you provided during the upload. See the example below:

If you’d like to retain any additional information for future reference, we recommend adding a note in the dataset description when prompted before clicking “Compute.”

Guide to Computation Options in Omics Playground

In the first box on the right (‘Feature filtering’), you can select options such as whether to append symbol ID, remove not-expressed features or include protein-coding genes(such as lnRNA) in the analysis.

The next box (‘Gene tests’) allows you to choose the statistical approaches for Differential gene or protein expression analysis.

The default choices will change based on the data type you selected at the beginning of the data upload. Specifically:

Proteomics: ttest, ttest.welch
RNA-Seq and other data types: ttest, ttest.welch, voom.limma, trend.limma, notrend.limma, deseq2.wald, deseq2.lrt, edger.qlf, edger.lrt.

The middle box (‘Enrichment methods’) is where you can select the enrichment methods for Enrichment analysis.

The default choices will vary based on whether you’re uploading a transcriptomics or proteomics dataset (fgsea, fisher). If you have a particularly large datasets with multiple pairwise comparisons and want to speed up processing time (particularly when working with scRNA-seq datasets), you might want to replace gsva with spearman, camera or fry.

Alternatively, ssgsea can provide more accurate results but will also significantly slow down run time. We recommend using it only for relatively small datasets and avoid it for scRNA-seq studies.

If you want to learn more about enrichment analysis methods you can read our guide to enrichment analysis.

The box ‘Extra Analysis’ contains advanced analysis methods.

By default they are all active and we recommend running them, unless you have large scRNA-seq datasets and want to improve run time.

In that case you can disable analysis types that may not be applicable to your study (e.g. drugs connectivity).

The final box on the right allows users to add a custom genesets file in the gmt format.

This is particularly useful if you want to add extra genesets to the analysis not present in the platform, such as adding KEGG gene sets.

It is also very relevant if you are working with non-vertebrate species and want to include gene sets or pathways unique to your species of interest.

Ensure you select all the relevant tests and methods for your analysis before clicking ‘Compute’, as you will not be able to change this once the dataset is uploaded.

Step 5: Click Compute and enjoy exploring your data

Once you’re done naming your dataset and selecting any additional computation methods, click Compute and let Omics Playground do its magic!

Depending on the size of your dataset, the first upload of a new dataset typically takes around 10 to 30 minutes for Omics Playground to complete its computations. These will run in the background and once they’re complete, you’ll receive an email notification letting you know your dataset is ready for exploration.

In subsequent logins, reloading the same dataset will only take a few minutes.

Should you encounter any challenges throughout this process, our support team is readily available to assist you. Simply use the chat feature located in the bottom right corner of the platform to send your message, and our team will promptly respond to your inquiries.