Skip to main content

Ibidas: Querying Flexible Data Structures to Explore Heterogeneous Bioinformatics Data

  • Conference paper
Book cover Data Integration in the Life Sciences (DILS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7970))

Included in the following conference series:

Abstract

Nowadays, bioinformatics requires the handling of large and diverse datasets. Analyzing this data demands often significant custom scripting, as reuse of code is limited due to differences in input/output formats between both data sources and algorithms. This recurring need to write data-handling code significantly hinders fast data exploration.

We argue that this problem cannot be solved by just data integration and standardization alone. We propose that the integration-analysis chain misses a link: a query solution which can operate on diversely structured data throughout the whole bioinformatics workflow, rather than just on data available in the data sources. We describe how a simple concept (shared ’dimensions’) allows such a query language to be constructed, enabling it to handle flat, nested and multi-dimensional data. Due to this, one can operate in a unified way on the outputs of algorithms and the contents of files and databases, directly structuring the data in a format suitable for further analysis. These ideas have been implemented in a prototype system called Ibidas. To retain flexibility, it is directly integrated into a scripting language. We show how this framework enables the reuse of common data operations in different problem settings, and for different data interfaces, thereby speeding up data exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Galperin, M., Fernández-Suárez, X.: The 2012 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Research 40(D1), D1–D8 (2012)

    Article  Google Scholar 

  2. Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41(5), 687–693 (2008)

    Article  Google Scholar 

  3. Belleau, F., Nolin, M., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008)

    Article  Google Scholar 

  4. Goble, C., Belhajjame, K., Tanoh, F., Bhagat, J., Wolstencroft, K., Stevens, R., Nzuobontane, E., McWilliam, H., Laurent, T., Lopez, R.: BioCatalogue: a curated web service registry for the life science community. In: Microsoft eScience Workshop 2008, Indianapolis, IN, USA (2009)

    Google Scholar 

  5. Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., Kasprzyk, A.: BioMart – biological queries made easy. BMC Genomics 10(1), 22 (2009)

    Article  Google Scholar 

  6. Bellinger, G., Castro, D., Mills, A.: Data, information, knowledge, and wisdom (2004)

    Google Scholar 

  7. McKusick, V.: Mendelian Inheritance in Man and its online version, OMIM. American Journal of Human Genetics 80(4), 588 (2007)

    Article  Google Scholar 

  8. Zukowski, M., Boncz, P., Nes, N., Héman, S.: Monetdb/x100–a dbms in the cpu cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)

    Google Scholar 

  9. Roth, M., Arya, M., Haas, L., Carey, M., Cody, W., Fagin, R., Schwarz, P., Thomas, J., Wimmers, E.: The garlic project. ACM SIGMOD Record 25(2), 557 (1996)

    Article  Google Scholar 

  10. Jensen, L., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., et al.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Research 37(Database issue), D412 (2009)

    Google Scholar 

  11. Perez, F., Granger, B.: IPython: a system for interactive scientific computing. Computing in Science & Engineering, 21–29 (2007)

    Google Scholar 

  12. Oliphant, T.: Guide to NumPy (2006)

    Google Scholar 

  13. Gyssens, M., Lakshmanan, L.: A foundation for multi-dimensional databases. In: Proceedings of the International Conference on Very Large Data Bases, Citeseer, pp. 106–115 (1997)

    Google Scholar 

  14. Rew, R., Davis, G.: Netcdf: an interface for scientific data access. IEEE Computer Graphics and Applications 10(4), 76–82 (1990)

    Article  Google Scholar 

  15. HDF Group and others: Hdf5: Hierarchical data format, http://www.hdfgroup.org/hdf5

  16. Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., Yergeau, F.: Extensible markup language (XML) 1.0. W3C recommendation 6 (2000)

    Google Scholar 

  17. Colby, L.: A recursive algebra for nested relations. Information Systems 15(5), 567–582 (1990)

    Article  Google Scholar 

  18. Kim, W.: Introduction to object-oriented databases (1990)

    Google Scholar 

  19. Clark, J., DeRose, S.: XML path language (XPath) 1.0. W3C recommendation. World Wide Web Consortium (1999), http://www.w3.org/TR/xpath

  20. Haas, L., Schwarz, P., Kodali, P., Kotlar, E., Rice, J., Swope, W.: Discoverylink: A system for integrating life sciences data. IBM Systems Journal 40(2) 2001 (2001)

    Article  Google Scholar 

  21. Wong, L.: Kleisli, a functional query system. Journal of Functional Programming 10(01), 19–56 (2000)

    Article  Google Scholar 

  22. Baker, P., Brass, A., Bechhofer, S., Goble, C., Paton, N., Stevens, R.: TAMBIS-Transparent Access to Multiple Biological Information Sources. In: Proc. Int. Conf. on Intelligent Systems for Molecular Biology, pp. 25–34 (1998)

    Google Scholar 

  23. Miled, Z., Li, N., Baumgartner, M., Liu, Y.: A decentralized approach to the integration of life science web databases. Bioinformatics Tools and Applications 27, 3–14 (2003)

    Google Scholar 

  24. Shaker, R., Mork, P., Brockenbrough, J., Donelson, L., Tarczy-Hornoch, P.: The biomediator system as a tool for integrating biologic databases on the web. In: Workshop on Information Integration on the Web (IIWeb 2004), Toronto, CA (2004)

    Google Scholar 

  25. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM (2008)

    Google Scholar 

  26. Box, D., Hejlsberg, A.: The LINQ Project: .NET Language Integrated Query. Microsoft Corporation (2005)

    Google Scholar 

  27. Kersten, M., Zhang, Y., Ivanova, M., Nes, N.: Sciql, a query language for science applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 1–12. ACM (2011)

    Google Scholar 

  28. Shannon, P., Reiss, D., Bonneau, R., Baliga, N.: The Gaggle: an open-source software system for integrating bioinformatics software and data sources. BMC Bioinformatics 7(1), 176 (2006)

    Article  Google Scholar 

  29. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34(Web Server issue), W729 (2006)

    Google Scholar 

  30. Giardine, B., Riemer, C., Hardison, R., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al.: Galaxy: a platform for interactive large-scale genome analysis. Genome Research 15(10), 1451 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hulsman, M., Bot, J.J., de Vries, A.P., Reinders, M.J.T. (2013). Ibidas: Querying Flexible Data Structures to Explore Heterogeneous Bioinformatics Data. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39437-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39436-2

  • Online ISBN: 978-3-642-39437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics