Achieving End-to-End Omics Data Analysis and Facilitating Interactive Results Sharing in Sequencing Units

Published on March 28th, 2024
by Murat Akhmedov

 12 min read

Interactive interface Omics Playground V3

Introduction

I recently had the opportunity to interview Jonathan Landry, Bioinformatician at EMBL’s GeneCore facility.

Jonathan completed his PhD at EMBL in the genomics and transcriptomics landscape of HeLa cells. Prior to that, he was also a research scientist at Cambridge University. He then continued as a bioinformatician at EMBL sequencing genome sequencing center, GeneCore in short.

Watch the video to see the full interview or continue reading to learn more about Jonathan’s role at EMBL and his take on end-to-end data analysis in Sequencing Units.

Interview

Would you like to provide more information about your background and your current role?

Jonathan: I’m a bioinformatician at the EMBL genomic core facility. I was trained as a molecular biologist in France before realizing that to work with omics data and analyze results I would need a bit more than Excel.

Then I moved to EBI in Cambridge (UK) to finish my Masters and I stayed there for a year and a half.

I joined EMBL in Heidelberg for my PhD during which I worked with genome assemblies and expression profiles of HeLa cell lines. After that, I joined the Genomic Core facility to actually support and help users on their projects to analyze data or train them to analyze their data.

That’s what I’ve been doing for the past nine years and then I met you on my way and we discussed what you are doing and that’s how I’m here today.

We would like to understand a bit more how you operate at GeneCore. What would the journey look like for end users if they wanted to sequence at GeneCore?

Jonathan: We are a sequencing facility so our main mission is to produce sequencing data. 

Research groups are approaching us with a specific biological question and they think that sequencing might help to better understand their system or to answer their question. 

We then sit together and try to design the experiment. Sometimes they have a precise idea, sometimes they need our help. They would either send ready-to-run sequencing libraries that they performed on their own or send nucleic acids that we can prepare for the sequencing.

 We have different methodologies and different platforms so we can really adapt the best sequencing strategy to their specific questions.

 We then perform the sequencing, do some QC and we might have some requests to help analyze the data.

 As you can imagine the range of applications can be quite vast and the range of the biological questions that might be asked is also very different. So depending on the resources that we have, we can address those requests and establish collaborations which come in different flavors of course. 

It could be a very standard approach where we do have a tool in place. Otherwise, we also can invest a bit of time to help them find a better approach to get insight into their question.

It’s very interesting. In terms of experiments, what kind of data types or experiments were your users mostly generating or running in the past years?

Jonathan: We do have quite a lot of genome assemblies in terms of type of application. We also have a lot of transcriptomics studies.

In the last five years, we have seen an increase in single-cell (droplet-based), SMART-Seq, and other kind of approaches. We see an appetite for such studies so we are more and more processing these kinds of samples.

Then we have ChIP-Seq and Methyl-Seq, so it’s not restricted to the couple of methodologies I just cited. It’s very vast.

EMBL also recently announced this new program that will focus on the ecology and metagenomics of different environmental samples and things like that, so it’s really diverse.

I also heard that your lab is establishing Olink pipelines right?

Jonathan: Yes, we do have some pilot experiments on Olink. I’m not in charge of this specifically but there are some people in the lab collaborating with GSK that are establishing the methods and they want to see if this provides some kind of value to their project.

If we define omics data analysis in three pillars—primary, secondary, and tertiary—primary focuses on data acquisition, storage, and management; secondary involves pipeline or workflow automation, including tasks such as mapping, alignment, quantification, and quality control; and tertiary being data discovery through exploration, visualization, and all downstream analysis.

With these three layers of data analysis in mind, what level of data analysis would you provide to your end users?

Jonathan: We provide primary and secondary analysis on a regular basis. As I was saying before, we are a sequencing facility so we do distribute the sequencing reads at the end and this comes with some automatic pipeline that looks at the quality of those sequences. If we see something weird we can still discuss with the user if it’s relevant or not. We also provide mapping for transcriptomics and quantification.

 For the last pillar, the discovery or interpretation of results, we have limited resources in terms of bioinformaticians in the lab, so we don’t have enough resources to address all the requests that we have. We try to answer most of them and also to provide a solution if we don’t have enough resources on our side.

 One used to be training in which we used to host the scientists or biologists in-house for a certain time and we were going through the analysis with them. We would try to give them some basic knowledge on how they could start to analyze their data and then they could build on that later.

 We try to automatize the way we do standard analysis and recently we started to use more and more of your platform (Omics Playground) on bulk RNA-Seq analysis. We load the data there and then share the project directly with the end user to enable them to browse their data. We then guide them through the platform and if they have questions they can always come back to us.

Happy to hear that you’re using Omics Playground for the tertiary analysis aspect. As you mentioned, most of the sequencing units, as well as CROs, mainly provide primary and secondary analysis but they might miss bandwidth in terms of supporting tertiary analysis.

What are the implications or the impact of not being able to provide tertiary analysis for sequencing units?

Jonathan: The value that we have is that we can quickly assess if an experiment worked.

Biologists usually have an idea of what they expect. If they did a certain specific experiment, they have identified some genes and so they want to have quick feedback on whether it is working or not. We can quickly assess that.

They send us the library, we sequence it, and with their collaboration, we provide them with some ideas or interpretations of the results, which they appreciate. They have one partner they can talk the entire project through. Since we are close to the lab, we know what happened there and we can quickly communicate with the wet lab people if they observe some issues with the library preparation or sample quality and things like this.

So we’re very much like one team that tries to take the project and bring it to some stage. It’s helpful for the biologist and the research group and it’s a collaborative effort as you can imagine.

Having this diversity of biological questions and also methodologies, we really need some interaction for us to understand what they are trying to do, and how we can best assess their questions and potentially answer them.

We can’t be experts on all things though. So they drive the project and we are here to support them in their journey.

So what would be your take on providing an end-to-end solution for sequencing units in terms of primary, secondary and tertiary analysis?

Jonathan: What we like very much, is to be involved as early as possible in the project, especially in the design and analysis stages.

I think that designing the right experiment is crucial to answer the research question.

Sometimes it happens that there are biases in terms of design or in terms of the methodology that was used and therefore the experiment doesn’t fully answer the question that they were trying to ask.

So being involved as early as possible is key and we’re trying to document what we are doing. We are not only providing plots, we are also trying to explain how we do that and why we do that. In this way, users become familiar with concepts and can potentially recreate those plots on their own.

We are sharing all the reports including scripts and figures that we are producing with versioning. It’s kind of robust and recently we started to use your platform (Omics Playground) to provide users with the ability to interact directly with the data.

This was one of the main features that for us was quite good. Once the data is in the platform and we drive users through the modules and what your tool can do, they can browse the data themselves.

They can interact with the platform, refine the plots, change the genes, ect. As you know, they are the experts, they have the expertise and know what they want to look for.

This makes it very appealing for us as a sequencing facility.

Pleasure to hear that. You mentioned that you’re using Omics Playground, especially for tertiary analysis and to share final results and data with your end users right?

Jonathan: Yes we have some kind of understanding of what they want to do.

I usually check a couple of things on the experiment, quality-wise and also if the controls are really controls, if the replicates are really replicates, and so on. We then try to extract some meaningful insight but one question brings another one and you’re building your story.

You start with an idea and then you build on it. For the sequencing facility and me personally, it’s very nice because curiosity kicks in and then you are also involved in the project. 

On the management side though, it’s very difficult to predict how long we can support a project because we have very creative collaborators.

There are always new ideas and it’s good that at some point, with the help of your platform, they become players in data analysis so they can look into it by themselves.

Before Omics Playground, how were you sharing your results with the end user?

Jonathan: They have access to the raw data, of course, but also to any intermediate data that we consider relevant, such as alignments and all kinds of tables or graphics and plots that we generate.

We also provide them with all the codes that we use with versioning. We’re simply using R-markdown and then sharing reports. We then go through them during a review session, either remotely or in person, to try to build the story around that interaction. This is the time to discuss the results and support them in the research project.

With static reports and, for instance, the static Excel sheets, that interactivity for the end user becomes questionable right?

Jonathan: Yes and that’s what I was mentioning before.

After an initial session with them on your platform, where we share the experiment through the data-sharing functionality, we can simply browse the data together, show them how this actually works, and then they can independently check other things or refine the analysis, to make it more powerful and insightful.

This is a tricky question. Being a computational biologist yourself, how would you define the difference between a computational biologist and a bioinformatician?

Jonathan: Well, that’s a good one. I actually never thought about it.

Computational biologists are likely to be someone who is more trained in biology, more used to using software to analyze the data, and would invest less time in developing their own tool. They would probably use what is already available in the literature or as tools.

Bioinformaticians have more background knowledge on developing their new approaches in terms of computational analysis. This could be the slight difference between them for me.

I think that they are both rare and also quite important.

Nowadays, generating sequencing data is no longer a problem. We have a lot of platforms that are performing quite well and are very powerful and reliable.

One of the crucial bottlenecks is making sense of this data. I mean, in my time, and hopefully, I’m not that old, manipulating this data and these types of volumes, I think was necessary. 

Today, the new generation is actually used to it and is receiving education on these topics so that they are more familiar with the computer and know how to manipulate this data to extract meaningful information.

As a computational biologist at EMBL, do you have any tips and advice for other computational biologists or bioinformaticians supporting sequencing units elsewhere?

Jonathan: Well, I was referring to it a bit earlier on: try to be involved as early as possible in the project.

Sometimes it’s crucial to have enough replicates, to have enough depth of sequencing. Simple designs are usually the best to answer a specific question. Therefore, being part of this prior conversation is important.

Also, what makes EMBL a great place to work is the environment. We can interact with a lot of people with different fields of expertise, different approaches, and different backgrounds from which we can learn a lot.

So as soon as my knowledge in one field or another is limited, I can just knock on the next door and have someone explain to me what’s going on. This is really important and the reason why I like working at EMBL. There are quite good people around me.

How do you see the future of bioinformatics? What kind of challenges are we going to have or are we heading to?

Jonathan: I think one aspect is probably the reproducibility and robustness of the analysis. We need to be transparent, document what we do and be consistent about it.

I think we have great tools now that do that and we just need to focus on using them.

In terms of approaches, we have an increased number of projects with single-cell applications and it is now moving towards spatial transcriptomics. We are currently focusing heavily on this type of approach.

I think there is great potential to dissect more and more every cell and every kind of sub-region of a tissue or a specific biological complex.

This is for the very next future.

In the long term, there is more and more robotics like AI and deep learning all around us, and this too needs to be integrated in a sensible way.

About the Author

Murodzhon Akhmedov

Murat Akhmedov holds a doctorate from the Institute of Oncology Research (IOR) and the Dalle Molle Institute for Artificial Intelligence (IDSIA) in Switzerland. His research focus is on the application of graph algorithms and artificial intelligence in cancer systems biology. He is currently the CEO and co-founder of BigOmics Analytics. 

Want to know more how we can support your sequencing unit? Book a meeting with our team.