The least you could say about the research data generated by our VIB colleagues is that it involves complex information which requires specialized data applications. However, more and more scientists are combining this complexity with large volumes at the same time. The result is big data, and that is where our VIB Bioinformatics Core (formerly BITS), led by Alexander Botzki, comes in.
Because of the heterogeneous structure of bioinformatics at VIB, data analysis is handled mainly in our bioinformatics labs or by embedded bioinformatics facilities at the various VIB centers. The goal of Alex and his team is to set up a more integrated bioinformatics community, together with Lennart Martens of the VIB-UGent Center for Medical Biotechnology, to better help VIB wet and dry lab scientists to convert data into insightful biological knowledge.
Alex, where is the VIB Bioinformatics Core now in terms of big data?
Alex: “As a core facility, our main focus is to deliver high-quality services to support our scientists. It goes without saying that we have to continuously adapt our training program and other services. Big data is a relative newcomer here, but we have already taken important steps towards our objective in this field. For example, we have launched a number of training courses to encourage our people to feel confident embracing new big data technologies in their research.”
Could you give some examples of these training courses?
Alex: “In the last three years, we have introduced several courses. For example, in next generation sequencing, we debuted RNA-Seq for differential expression and NGS analysis as well as DESeq2, EdgeR, and GATK. To facilitate the life of our trainees, our goal is to provide a walkthrough example of an RNA-Seq differential expression workflow on a web-based platform like GenePattern.
Such a tool provides VIB scientists with powerful computing resources and the ease of a graphical interface at the same time. This bulk RNA-Seq pipeline developed by Guy Bottu from our core will be complemented with pipelines for single cell transcriptome analyses in the very near future. In collaboration with the Stein Aerts (VIB-KU Leuven Center for Brain and Disease Research) and Yvan Saeys (VIB-UGent Center for Inflammation Research) labs, representative datasets to develop this workflow have been created on the 10x Genomics® Chromium™ instrument introduced by our Tech Watch team. We are also planning a 5-day summer school in 2018 to cover this rapidly developing research area.”
In 3D electron microscopy, big data is a real issue. What steps has VIB already taken in this field?
Alex: “Nowadays, electron microscopy datasets of around 500 gigabytes are the norm. Traditional software can’t really handle this volume of data, so the analysis is slow. That is why Frank Vernaillen from our team is developing user-friendly and efficient software tailored to processing 3D SEM data, such as image registration, image denoising and segmentation. This improves (semi-)automatic segmentation accuracy, and leads to cleaner images for morphological analysis. We rolled out the tool as an analysis plugin within the popular Fiji/ImageJ tool. In collaboration with the VIB Bioimaging Core, the Yvan Saeys lab and the IPI lab from UGent, version 1 has just been released, and we’re now putting new projects together into which we can incorporate the new tool.”
‘Omics’-based research lines are generating massive data as well. What are VIB’s plans in this respect?
Alex: “The main challenge when it comes to the floods of multi-omics data has clearly shifted from data acquisition to analysis. Currently, we are still lacking one of the foundations of the data ecosystem: the ability to annotate experimental raw data with consistent and sufficient metadata. As a result, integrative analyses remain difficult. The first step has already been taken: together with Lennart Martens, we propose a common format to store metadata. Next, we will need to coordinate guidelines, best practices and provide tools and training to implement this consistently across VIB’s Core Facilities and relevant research groups.”
To which extent will this boost VIB’s already strong position in the omics field?
Alex: “The integration and analysis of pooled omics datasets is not only interesting from a purely scientific point of view; it also comes with clear valorization potential. Unlocking this wealth of information and converting it into knowledge will set the scene for new start-ups in the bioinformatics area. I’m convinced that VIB will play a strategic role in this evolution: our current bioinformatics groups already boast highly complementary, world-class expertise in each of the relevant omics fields.”
Big data evolutions will also force scientists to hone specific skills. Are there any programs being set up to help us develop new competencies?
Alex: “Staying up-to-date is indeed crucial. That is why we are in the process of building the Bioinformatics and Computational Biology Community (CBBI). And because data science and analytics skills will probably become more important in the long run, we have been preselected for an international project called ‘HELIS Academy’ alongside other research institutes. Together with the Dutch Techcentre for Life Sciences and the Eindhoven University of Technology, we plan to co-develop a data science and analysis curriculum. If we are successful in the second round, a 15 day-course will be set up which allows scientists and bioinformaticians
from VIB and beyond to obtain the appropriate skill sets for big data in life science. And secondly, this curriculum should allow interns in the biotech sector to enter the job market more easily.”
On a European scale, ELIXIR was recently launched. How does our Bioinformatics Core fit into that initiative?
Alex: “It’s definitely closely intertwined. As the ELIXIR Training Coordinator, I’m working together with our VIB Bioinformatics trainers Janick Mathys and Christof De Bo, and ELIXIR Belgium to build a Belgian bioinformatics training community. We are also organizing training courses related to ELIXIR focus areas. For example, we’re developing an introductory course on computational skills needed for data management and analysis. We have also included the training materials and events of our Belgian ELIXIR partners in the online ELIXIR TeSS platform.”
To conclude, how can life sciences harness the power of big data and focus their investments on high-impact returns?
Alex: “It’s a big leap from real-world evidence to life science data, because diverse data sets must be combined in a well-integrated data and analytics ecosystem. Our scientific frontrunners play an essential role in this story, as they are the ones who help define the possibilities of big data. As a research institute, we will have to position ourselves like life science businesses. This means embracing the opportunities of big data as a key differentiator to solve fundamental scientific questions to the benefit of society.”
Discover TeSS, ELIXIR’s online training platform on bioinformatics via www.bits.vib.be/elixir