An Asset Management Approach to Continuous Integration of Heterogeneous Biomedical Data

  • Robert E. Schuler
  • Carl Kesselman
  • Karl Czajkowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8574)


Increasingly, advances in biomedical research are the result of combining and analyzing heterogeneous data types from different sources, spanning genomic, proteomic, imaging, and clinical data. Yet despite the proliferation of data-driven methods, tools to support the integration and management of large collections of data for purposes of data driven discovery are scarce, leaving scientists with ad hoc and inefficient processes. The scientific process could benefit significantly from lightweight methods for data integration that allow for exploratory, incrementally refined integration of heterogeneous data. In this paper, we address this problem by introducing a new asset management based approach designed to support continuous integration of biomedical data. We describe the system and our experiences using it in the context of several scientific applications.


Asset Management Biomedical Data Continuous Integration Institutional Repository Variable Call Format 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Howe, B., Cole, G., Souroush, E., Koutris, P., Key, A., Khoussainova, N., Battle, L.: Database-as-a-Service for Long-Tail Science. In: SSDBM 2011. LNCS, vol. 6809, pp. 480–489. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Halevy, A., Franklin, M., Maier, D.: Principles of Dataspace Systems. In: PODS 2006, ACM, Chicago (2006)Google Scholar
  3. 3.
    Digital Asset Management. Wikipedia (2014)Google Scholar
  4. 4.
    Tunkelang, D.: Faceted Search. Synthesis Lectures on Information Concepts, Retrieval, and Services 1, 1–80 (2009)CrossRefGoogle Scholar
  5. 5.
    Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: VLDB 2006, pp. 9–16. VLDB Endowment, Seoul (2006)Google Scholar
  6. 6.
    Corwin, J., et al.: Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. Journal of the American 14, 86–93 (2007)Google Scholar
  7. 7.
    Plale, B., et al.: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science. International Journal of Digital Curation 8, 172–180 (2013)CrossRefGoogle Scholar
  8. 8.
    Hellerstein, J.M., et al.: The MADlib analytics library: or MAD skills, the SQL. In: Proceedings of the VLDB Endowment, pp. 1700–1711 (2012)Google Scholar
  9. 9.
    Smith, M., et al.: DSpace: An Open Source Dynamic Digital Repository. D-Lib Magazine 9 (2003)Google Scholar
  10. 10.
    Singh, G., et al.: A Metadata Catalog Service for Data Intensive Applica-tions. In: SuperComputing (SC 2003). ACM, Phoenix (2003)Google Scholar
  11. 11.
    Marcus, D.S., et al.: The Extensible Neuroimaging Archive Toolkit: an in-formatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics 5, 11–34 (2007)Google Scholar
  12. 12.
    Shoshani, A., Sim, A., Gu, J.: Storage resource managers: Middleware com-ponents for grid storage. In: NASA Conference Publication, pp. 209–224 (2002)Google Scholar
  13. 13.
    Rajasekar, A., et al.: iRODS Primer: Integrated Rule-Oriented Data System. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1–143 (2010)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Bittman, T.: Mind the Gap: Here Comes the Hybrid Cloud. In: Gartner Blog Network (2012)Google Scholar
  15. 15.
    Cattuto, C., Loreto, V., Pietronero, L.: Semiotic dynamics and collaborative tagging. Proceedings of the National Academy of Sciences 104(5), 1461–1464 (2007)CrossRefGoogle Scholar
  16. 16.
    Davis, P.M., Connolly, M.J.L.: Institutional Repositories: Evaluating the Reasons for Non-use of Cornell University’s Installation of DSpace. D-Lib Magazine 13 (2007)Google Scholar
  17. 17.
    Greenberg, J.: Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications. Journal of Internet Cataloging 6, 59–82 (2004)CrossRefGoogle Scholar
  18. 18.
    Lagoze, C., de Sompel, H.: The making of the open archives initiative proto-col for metadata harvesting. Library hi tech 21, 118–128 (2003)CrossRefGoogle Scholar
  19. 19.
    Tuchinda, R., Szekely, P., Knoblock, C.A.: Building data integration queries by demonstration. In: Proceedings of the 12th International Conference on Intelligent User Interfaces - IUI 2007, p. 170. ACM Press, New York (2007)Google Scholar
  20. 20.
    Allen, B., et al.: Software as a service for data scientists. Communications of the ACM 55, 81 (2012)CrossRefGoogle Scholar
  21. 21.
    Ananthakrishnan, R., et al.: Globus Nexus: An identity, profile, and group management platform for science gateways and other collaborative science applications. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–3 (2013)Google Scholar
  22. 22.
    Agus, D.B., et al.: A physical sciences network characterization of non-tumorigenic and metastatic cells. Scientific Reports 3, 1449 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Robert E. Schuler
    • 1
  • Carl Kesselman
    • 1
  • Karl Czajkowski
    • 1
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaUSA

Personalised recommendations