Ibidas: Querying Flexible Data Structures to Explore Heterogeneous Bioinformatics Data

  • Marc Hulsman
  • Jan J. Bot
  • Arjen P. de Vries
  • Marcel J. T. Reinders
Conference paper

DOI: 10.1007/978-3-642-39437-9_2

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7970)
Cite this paper as:
Hulsman M., Bot J.J., de Vries A.P., Reinders M.J.T. (2013) Ibidas: Querying Flexible Data Structures to Explore Heterogeneous Bioinformatics Data. In: Baker C.J.O., Butler G., Jurisica I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science, vol 7970. Springer, Berlin, Heidelberg

Abstract

Nowadays, bioinformatics requires the handling of large and diverse datasets. Analyzing this data demands often significant custom scripting, as reuse of code is limited due to differences in input/output formats between both data sources and algorithms. This recurring need to write data-handling code significantly hinders fast data exploration.

We argue that this problem cannot be solved by just data integration and standardization alone. We propose that the integration-analysis chain misses a link: a query solution which can operate on diversely structured data throughout the whole bioinformatics workflow, rather than just on data available in the data sources. We describe how a simple concept (shared ’dimensions’) allows such a query language to be constructed, enabling it to handle flat, nested and multi-dimensional data. Due to this, one can operate in a unified way on the outputs of algorithms and the contents of files and databases, directly structuring the data in a format suitable for further analysis. These ideas have been implemented in a prototype system called Ibidas. To retain flexibility, it is directly integrated into a scripting language. We show how this framework enables the reuse of common data operations in different problem settings, and for different data interfaces, thereby speeding up data exploration.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marc Hulsman
    • 1
  • Jan J. Bot
    • 1
  • Arjen P. de Vries
    • 1
    • 3
  • Marcel J. T. Reinders
    • 1
    • 2
  1. 1.Delft Bioinformatics LabDelft University of TechnologyThe Netherlands
  2. 2.Netherlands Bioinformatics Centre (NBIC)The Netherlands
  3. 3.Centrum Wiskunde & Informatica (CWI)The Netherlands

Personalised recommendations