Big Data meets biology. With the advent of high-throughput ‘omics’ technologies, life scientists are starting to grapple with massive data sets. As Big Omics data continues to grow, biologists start encountering challenges with handling, processing and analyzing information that were once the domain of astronomers and high-energy physicists.
Bioinformatics is becoming more and more intertwined in the work of the biologist. Whether this is in the form of the analysis of their own data or to keep up with the new technologies, it’s an exciting time for researchers who can take advantage of Big Omics data at hand in order to improve.
One such trend is the rise of the DIY computational biologist, a biologist who learned to use available software to analyze their data themselves. While this person will not replace the expert bioinformatician, they will take on an ever-expanding role in the coming years.
The Current State of Bioinformatics Tools
Current bioinformatics tools consists of a myriad of free software packages and a jungle of websites that provide specific bioinformatic services. For anyone wanting to analyze omics data, wading through this bazaar of options requires expert knowledge of which software package to choose or what website to use.
Traditionally, bioinformaticians would cook up scripts and juggle files between the websites to analyze the data. For biologists who want to analyze their own data, they would then pass on these scripts. The problem is that most biologists would still require considerable time to learn running the scripts and let alone being able to modify them in case something needs to be changed.
The Rise of Self-Service Bioinformatics
As omics technologies have become affordable and public data has become plentiful, the bottleneck is not anymore the data but its analytics.
After years of biologists being dependent on busy, understaffed IT teams for their data analysis needs, a shift is occurring. Tired of the inflexibility and slow turnaround, and also the willingness to understand their data better, more and more biologists are turning away from the traditional bioinformatics support to self-service analytics. Blending their intuition with self-service tools to analyze, they may better leverage the data and provide deeper insights.
Organizations that have relied heavily on in-house programming are realizing they need a better bioinformatics platform — one that doesn’t lead to a patchwork of scripts that is increasingly messy, unreliable and unsustainable. New self-service bioinformatics platforms are essential to organizations that seek to place Big Omics data at the core of their research efforts.
Not only does self-service bioinformatics free up bioinformaticians to focus on more strategic work than reporting, but it also enables biologists to access what data they want and when they need it. This democratization of omics data across an organization opens up new opportunities that simply wouldn’t be possible with traditional bioinformatics tools. The rise of self-service bioinformatics is changing the way research institutes look at hiring and perform their biological data analysis. As Big Omics data continues to grow and more bioinformatics is needed, self-serving computational biologists are set to exceed bioinformaticians in the future.
Challenges in Achieving Success as the Big Omics Data Stacks Up
As the “data explosion” in Big Omics data is taking place, the largest threat for research organizations is that of becoming too data-rich and insight-poor:
They accumulate vast amounts of omics data that they have no idea what to do with, and no hope of learning anything useful from. Of course, the problem becomes even bigger when we take into account the predicted growth in the data companies will produce. If a company is already struggling to store and analyze its own data now, it will be drowning in data in the next few years.
To add to the problem, omics data has a lifespan. Technologies evolve and data gets outdated. Often, a data set is only used once for a publication and after that forgotton of. But “old data”, when properly processed, can actually be reused and become important for meta analysis.
Onboarding and Training
Data is only as good the people’s ability to understand it. Therefore, training, support and onboarding processes are needed to creating a data-driven culture and to permeate Big Data through all levels from research to decision making. No matter how easy-to-implement or easy-to-use the new self-service bioinformatics platform is, getting users enthusiastic and ready to use the software should be your primary focus.
Self-Service Bioinformatics And The Illusion Of Self-Sufficiency
In the traditional service model, bioinformaticians write the analysis reports, in the self-service model they would leave the reporting (tertiary analysis) to the biologists, and concentrate on data processing (primary analysis) and statistical analysis (secondary analysis). Self-service bioinformatics platforms are not going to magically transform all of your biologists into “citizen” bioinformaticians or render the bioinformaticians redundant. Platforms still needs to be further developed and managed over time.
The danger of software is that users often blindly trust the tool because they believe that ‘computers do not make mistakes’. But they forget that software is written by humans and it can have bugs. Moreover, each computational problem needs to be tackled with the correct tool which often needs good statistical and bioinformatic knowledge.
Finally, scripts would be used on new data without scrutinizing quality of the new data. The quality of the results depend on the quality of the input data: garbage in, is garbage out. The bioinformatics staff also has the responsibility to ensure that input data, analysis methods and output reports adhere to institutional standards. We foresee such “governed self-service platforms” as the way to improve and scale-up bioinformatics in the era of Big Omics data driven biological research.
“The big challenges of big data”. Nature (2013).
“Tech’s Next Big Wave: Big Data Meets Biology”. www.fortune.com (2018).
“Philosophy of Biology: The challenges of big data biology”, eLIFE (2019).
“The Rise of Self-Service Analytics: Challenges and Opportunities”. www.mibar.net (2018).
“Self-Service Analytics And The Illusion Of Self-Sufficiency”. Forbes.com (2016).