Skip to main content

A Secured Collaborative Model for Data Integration in Life Sciences

  • Chapter
  • 455 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 6990))

Abstract

Life Sciences research extensively and routinely use external online databases, tools and applications for the implementation of computational pipelines. These applications are among the truly distributed and highly collaborative global systems in existence. Since the resources these applications use are designed to serve individual users, they adopt an all-or-nothing model in which users necessarily have to accept the entire response even though only a fraction of the response is relevant. In computational pipelines involving several databases and complex repeat operations, costs due to unnecessary data transmissions and computations could be significant enough to reduce productivity and make the applications sluggish. Since these resources are autonomous, and do not accept user instructions or queries, users are not able to customize their behavior in order to reduce network latency and wasteful computation or data transmission. Obviously, such a resource utilization and sharing model is wasteful and expensive. In this paper, our goal is to propose a new collaborative data integration and computational pipeline execution model for systems biology research. We show that in our envisioned model, arbitrary sites are able to accept user constraints and limited processing instructions to avoid wasteful computation resulting in improved overall efficiency. We also demonstrate that the proposed collaborative model does not breach site security or infringe upon its autonomy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati, F.N., Damigos, M., Gergatsoulis, M.: Query containment under bag and bag-set semantics. Inf. Process. Lett. 110(10), 360–369 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: An extensible system for design and execution of scientific workflows. In: SSDBM, p. 423 (2004)

    Google Scholar 

  3. Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs. In: PODS, pp. 1–15 (1986)

    Google Scholar 

  4. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: Genbank. Nucleic Acids Res. 36(database issue) (January 2008)

    Google Scholar 

  5. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)

    Article  Google Scholar 

  6. Bhattacharjee, A., Islam, A., Amin, M.S., Hossain, S., Hosain, S., Jamil, H., Lipovich, L.: On-the-fly integration and ad hoc querying of life sciences databases using LifeDB. In: 20th International Conference on Database and Expert Systems Applications, Linz, Austria, pp. 561–575 (August 2009)

    Google Scholar 

  7. Boulakia, S.C., Biton, O., Davidson, S.B., Froidevaux, C.: Bioguidesrs: querying multiple sources with a user-centric perspective. Bioinformatics 23(10), 1301–1303 (2007)

    Article  Google Scholar 

  8. Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1), 1090–1101 (2009)

    Google Scholar 

  9. Calvanese, D., Giacomo, G.D., Lenzerini, M., Vardi, M.Y.: View-based query containment. In: PODS, pp. 56–67 (2003)

    Google Scholar 

  10. Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1(1), 146–166 (1989)

    Article  Google Scholar 

  11. Chen, L., Jamil, H.M.: On using remote user defined functions as wrappers for biological database interoperability. International Journal on Cooperative Information Systems 12(2), 161–195 (2003)

    Article  Google Scholar 

  12. Farré, C., Teniente, E., Urpí, T.: Checking query containment with the cqc method. Data Knowl. Eng. 53(2), 163–223 (2005)

    Article  Google Scholar 

  13. Freire, J.: Practical problems in coupling deductive engines with relational databases. In: Proceedings of the 5th KRDB Workshop, Seattle, WA, pp. 11-1–11-7 (May 1998)

    Google Scholar 

  14. Grahne, G., Thomo, A.: Query containment and rewriting using views for regular path queries under constraints. In: PODS, pp. 111–122 (2003)

    Google Scholar 

  15. Guo, S., Dong, X., Srivastava, D., Zajac, R.: Record linkage with uniqueness constraints and erroneous values. PVLDB 3(1), 417–428 (2010)

    Google Scholar 

  16. Gusfield, D., Stoye, J.: Relationships between p63 binding, dna sequence, transcription activity, and biological function in human cells. Mol. Cell 24(4), 593–602 (2006)

    Article  Google Scholar 

  17. He, B., Zhang, Z., Chang, K.C.C.: MetaQuerier: querying structured web sources on-the-fly. In: SIGMOD Conference, pp. 927–929 (2005)

    Google Scholar 

  18. Hosain, S., Jamil, H.: An algebraic foundation for semantic data integration on the hidden web. In: Third IEEE International Conference on Semantic Computing, Berkeley, CA (September 2009)

    Google Scholar 

  19. Hossain, S., Jamil, H.: A visual interface for on-the-fly biological database integration and workflow design using VizBuilder. In: 6th International Workshop on Data Integration in the Life Sciences (July 2009)

    Google Scholar 

  20. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34 (July 2006); web Server issue

    Google Scholar 

  21. Jamil, H., Islam, A., Hossain, S.: A declarative language and toolkit for scientific workflow implementation and execution. International Journal of Business Process Integration and Management 5(1), 3–17 (2010); iEEE SCC/SWF 2009 Special Issue on Scientific Workflows

    Article  Google Scholar 

  22. Jamil, H., Jagadish, H.V.: Accepting external constraints on deep web database query forms and surviving it. Tech. rep., Department of Computer Science, Wayne State University, Michigan (June 2011)

    Google Scholar 

  23. Kent, J.W., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at ucsc. Genome Res. 12(6), 996–1006 (2002)

    Article  Google Scholar 

  24. Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 3(1), 484–493 (2010)

    Google Scholar 

  25. Penabad, M.R., Brisaboa, N.R., Hernández, H.J., Paramá, J.R.: A general procedure to check conjunctive query containment. Acta Inf. 38(7), 489–529 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  26. Roichman, A., Gudes, E.: Fine-grained access control to web databases. In: SACMAT, pp. 31–40 (2007)

    Google Scholar 

  27. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB 2006, pp. 691–702 (2006)

    Google Scholar 

  28. Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Inf. Syst. 26(8), 607–633 (2001)

    Article  MATH  Google Scholar 

  29. Wang, K., Tarczy-Hornoch, P., Shaker, R., Mork, P., Brinkley, J.: Biomediator data integration: Beyond genomics to neuroscience data. In: AMIA Annu. Symp. Proc., pp. 779–783 (2005)

    Google Scholar 

  30. Yakout, M., Elmagarmid, A.K., Elmeleegy, H., Ouzzani, M., Qi, A.: Behavior based record linkage. PVLDB 3(1), 439–448 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jamil, H. (2011). A Secured Collaborative Model for Data Integration in Life Sciences. In: Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. Lecture Notes in Computer Science, vol 6990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23740-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23740-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23739-3

  • Online ISBN: 978-3-642-23740-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics