A Secured Collaborative Model for Data Integration in Life Sciences

  • Hasan Jamil
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6990)

Abstract

Life Sciences research extensively and routinely use external online databases, tools and applications for the implementation of computational pipelines. These applications are among the truly distributed and highly collaborative global systems in existence. Since the resources these applications use are designed to serve individual users, they adopt an all-or-nothing model in which users necessarily have to accept the entire response even though only a fraction of the response is relevant. In computational pipelines involving several databases and complex repeat operations, costs due to unnecessary data transmissions and computations could be significant enough to reduce productivity and make the applications sluggish. Since these resources are autonomous, and do not accept user instructions or queries, users are not able to customize their behavior in order to reduce network latency and wasteful computation or data transmission. Obviously, such a resource utilization and sharing model is wasteful and expensive. In this paper, our goal is to propose a new collaborative data integration and computational pipeline execution model for systems biology research. We show that in our envisioned model, arbitrary sites are able to accept user constraints and limited processing instructions to avoid wasteful computation resulting in improved overall efficiency. We also demonstrate that the proposed collaborative model does not breach site security or infringe upon its autonomy.

Keywords

Data Integration Call Statement Query Plan Computational Pipeline Parameter List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afrati, F.N., Damigos, M., Gergatsoulis, M.: Query containment under bag and bag-set semantics. Inf. Process. Lett. 110(10), 360–369 (2010)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: An extensible system for design and execution of scientific workflows. In: SSDBM, p. 423 (2004)Google Scholar
  3. 3.
    Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs. In: PODS, pp. 1–15 (1986)Google Scholar
  4. 4.
    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: Genbank. Nucleic Acids Res. 36(database issue) (January 2008)Google Scholar
  5. 5.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)CrossRefGoogle Scholar
  6. 6.
    Bhattacharjee, A., Islam, A., Amin, M.S., Hossain, S., Hosain, S., Jamil, H., Lipovich, L.: On-the-fly integration and ad hoc querying of life sciences databases using LifeDB. In: 20th International Conference on Database and Expert Systems Applications, Linz, Austria, pp. 561–575 (August 2009)Google Scholar
  7. 7.
    Boulakia, S.C., Biton, O., Davidson, S.B., Froidevaux, C.: Bioguidesrs: querying multiple sources with a user-centric perspective. Bioinformatics 23(10), 1301–1303 (2007)CrossRefGoogle Scholar
  8. 8.
    Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1), 1090–1101 (2009)Google Scholar
  9. 9.
    Calvanese, D., Giacomo, G.D., Lenzerini, M., Vardi, M.Y.: View-based query containment. In: PODS, pp. 56–67 (2003)Google Scholar
  10. 10.
    Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1(1), 146–166 (1989)CrossRefGoogle Scholar
  11. 11.
    Chen, L., Jamil, H.M.: On using remote user defined functions as wrappers for biological database interoperability. International Journal on Cooperative Information Systems 12(2), 161–195 (2003)CrossRefGoogle Scholar
  12. 12.
    Farré, C., Teniente, E., Urpí, T.: Checking query containment with the cqc method. Data Knowl. Eng. 53(2), 163–223 (2005)CrossRefGoogle Scholar
  13. 13.
    Freire, J.: Practical problems in coupling deductive engines with relational databases. In: Proceedings of the 5th KRDB Workshop, Seattle, WA, pp. 11-1–11-7 (May 1998)Google Scholar
  14. 14.
    Grahne, G., Thomo, A.: Query containment and rewriting using views for regular path queries under constraints. In: PODS, pp. 111–122 (2003)Google Scholar
  15. 15.
    Guo, S., Dong, X., Srivastava, D., Zajac, R.: Record linkage with uniqueness constraints and erroneous values. PVLDB 3(1), 417–428 (2010)Google Scholar
  16. 16.
    Gusfield, D., Stoye, J.: Relationships between p63 binding, dna sequence, transcription activity, and biological function in human cells. Mol. Cell 24(4), 593–602 (2006)CrossRefGoogle Scholar
  17. 17.
    He, B., Zhang, Z., Chang, K.C.C.: MetaQuerier: querying structured web sources on-the-fly. In: SIGMOD Conference, pp. 927–929 (2005)Google Scholar
  18. 18.
    Hosain, S., Jamil, H.: An algebraic foundation for semantic data integration on the hidden web. In: Third IEEE International Conference on Semantic Computing, Berkeley, CA (September 2009)Google Scholar
  19. 19.
    Hossain, S., Jamil, H.: A visual interface for on-the-fly biological database integration and workflow design using VizBuilder. In: 6th International Workshop on Data Integration in the Life Sciences (July 2009)Google Scholar
  20. 20.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34 (July 2006); web Server issueGoogle Scholar
  21. 21.
    Jamil, H., Islam, A., Hossain, S.: A declarative language and toolkit for scientific workflow implementation and execution. International Journal of Business Process Integration and Management 5(1), 3–17 (2010); iEEE SCC/SWF 2009 Special Issue on Scientific WorkflowsCrossRefGoogle Scholar
  22. 22.
    Jamil, H., Jagadish, H.V.: Accepting external constraints on deep web database query forms and surviving it. Tech. rep., Department of Computer Science, Wayne State University, Michigan (June 2011)Google Scholar
  23. 23.
    Kent, J.W., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at ucsc. Genome Res. 12(6), 996–1006 (2002)CrossRefGoogle Scholar
  24. 24.
    Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 3(1), 484–493 (2010)Google Scholar
  25. 25.
    Penabad, M.R., Brisaboa, N.R., Hernández, H.J., Paramá, J.R.: A general procedure to check conjunctive query containment. Acta Inf. 38(7), 489–529 (2002)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Roichman, A., Gudes, E.: Fine-grained access control to web databases. In: SACMAT, pp. 31–40 (2007)Google Scholar
  27. 27.
    Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB 2006, pp. 691–702 (2006)Google Scholar
  28. 28.
    Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Inf. Syst. 26(8), 607–633 (2001)CrossRefMATHGoogle Scholar
  29. 29.
    Wang, K., Tarczy-Hornoch, P., Shaker, R., Mork, P., Brinkley, J.: Biomediator data integration: Beyond genomics to neuroscience data. In: AMIA Annu. Symp. Proc., pp. 779–783 (2005)Google Scholar
  30. 30.
    Yakout, M., Elmagarmid, A.K., Elmeleegy, H., Ouzzani, M., Qi, A.: Behavior based record linkage. PVLDB 3(1), 439–448 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hasan Jamil
    • 1
  1. 1.Department of Computer ScienceWayne State UniversityUSA

Personalised recommendations