Big Data and information infrastructure in basic and translational research
Molecular biology is now a leading example of a data intensive science, with both pragmatic and theoretical challenges being raised by data volumes and dimensionality of the data. These changes are present in both “large scale” consortia science and small scale science, and across now a broad range of applications – from human health, through to agriculture and ecosystems. All of molecular life science is feeling this effect. As molecular techniques – from genomics through transcriptomics and metabolomics – drop in price and turn around time there is a wealth of opportunity for clinical research and in some cases, active changes clinical practice even at this early stage. The development of this work requires inter-disciplinary teams spanning basic research, bioinformatics and clinical expertise. This shift in modality is creating a wealth of new opportunities and has some accompanying challenges. In particular there is a continued need for a robust information infrastructure for molecular biology and clinical research. This ranges from the physical aspects of dealing with data volume through to the more statistically challenging aspects of interpreting it. A particular problem is finding causal relationships in the high level of correlative data. Genetic data are particular useful in resolving these issues. A particular challenge is creating and maintaining an appropriate information infrastructure for this work. The ELIXIR ESFRI project is the European structure to develop and sustain this infrastructure, with EMBL-EBI being a key node in the infrastructure. I will describe the rationale behind ELIXIR, it’s current working practice and some future thoughts on the development of information infrastructure in the lifesciences.