Chapter 15: Search Computing and the Life Sciences

  • Marco Masseroli
  • Norman W. Paton
  • Irena Spasić
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5950)

Abstract

Search Computing has been proposed to support the integration of the results of search engines with other data and computational resources. A key feature of the resulting integration platform is direct support for multi-domain ordered data, reflecting the fact that search engines produce ranked outputs, which should be taken into account when the results of several requests are combined. In the life sciences, there are many different types of ranked data. For example, ranked data may represent many different phenomena, including physical ordering within a genome, algorithmically assigned scores that represent levels of sequence similarity, and experimentally measured values such as expression levels. This chapter explores the extent to which the search computing functionalities designed for use with search engine results may be applicable for different forms of ranked data that are encountered when carrying out data integration in the life sciences. This is done by classifying different types of ranked data in the life sciences, providing examples of different types of ranking and ranking integration needs in the life sciences, identifying issues in the integration of such ranked data, and discussing techniques for drawing conclusions from diverse rankings.

Keywords

search computing bioinformatics data integration ranked data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Stead, D., Paton, N.W., Missier, P., Embury, S.M., Hedeler, C., Jin, B., Brown, A.J.P., Preece, A.D.: Information quality in proteomics. Brief. Bioinform. 9(2), 174–188 (2008)CrossRefGoogle Scholar
  2. 2.
    Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., Contrino, S., Coulson, R., Farne, A., Lara, G.G., Holloway, E., Kapushesky, M., Lilja, P., Mukherjee, G., Oezcimen, A., Rayner, T., Rocca-Serra, P., Sharma, A., Sansone, S., Brazma, A.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 33(Database issue), D553-D555 (2005)CrossRefGoogle Scholar
  3. 3.
    Galperin, M.Y., Cochrane, G.R.: Nucleic Acids Research annual database issue and the NAR online molecular biology database collection in 2009. Nucleic Acids Res. 37(Database issue), D1–D4 (2009)CrossRefGoogle Scholar
  4. 4.
    Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(suppl. 2), S8 (2008)CrossRefGoogle Scholar
  5. 5.
    Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005)CrossRefGoogle Scholar
  6. 6.
    Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Mashing up search services. IEEE Internet Comput. 12(5), 16–23 (2008)CrossRefGoogle Scholar
  7. 7.
    Hernandez, T., Kambhampati, S.: Integration of biological sources: current systems and challenges ahead. SIGMOD Record 33(3), 51–60 (2004)CrossRefGoogle Scholar
  8. 8.
    Masseroli, M., Ceri, S., Campi, A.: Integration and mining of genomic annotations: experiences and perspectives in GFINDer data warehousing. In: Paton, N.W., Missier, P., Hedeler, C. (eds.) DILS 2009. LNCS (LNBI), vol. 5647, pp. 88–95. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C.A., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)CrossRefGoogle Scholar
  10. 10.
    Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 534–551 (2001)CrossRefGoogle Scholar
  11. 11.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International World Wide Web Conference, WWW 2001, pp. 613–622. ACM Press, New York (2001)Google Scholar
  12. 12.
    Edgar, R., Domravech, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)CrossRefGoogle Scholar
  13. 13.
    Jones, P., Côté, R.G., Martens, L., Quinn, A.F., Taylor, C.F., Derache, W., Hermjakob, H., Apweiler, R.: PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 34(Database Issue), D659–D663 (2006)CrossRefGoogle Scholar
  14. 14.
    Olken, F.: Graph data management for molecular biology. OMICS: A Journal of Integr. Biol. 7(1), 75–78 (2003)CrossRefGoogle Scholar
  15. 15.
    Castrillo, J.I., Zeef, L.A., Hoyle, D.C., Zhang, N., Hayes, A., Gardner, D.C., Cornell, M.J., Petty, J., Hakes, L., Wardleworth, L., Rash, B., Brown, M., Dunn, W.B., Broadhurst, D., O’Donoghue, K., Hester, S.S., Dunkley, T.P., Hart, S.R., Swainston, N., Li, P., Gaskell, S.J., Paton, N.W., Lilley, K.S., Kell, D.B., Oliver, S.G.: Growth control of the eukaryote cell: a systems biology study in yeast. J. Biol. 6(2), 4 (2007)CrossRefGoogle Scholar
  16. 16.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  17. 17.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process Manag. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  18. 18.
    Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.J., Hsu, C.N., Tsai, R.T., Hung, H.C., Lau, W.W., Johnson, C.A., Saetre, R., Yoshida, K., Chen, Y.H., Kim, S., Shin, S.Y., Zhang, B.T., Baumgartner Jr., W.A., Hunter, L., Haddow, B., Matthews, M., Wang, X., Ruch, P., Ehrler, F., Ozgür, A., Erkan, G., Radev, D.R., Krauthammer, M., Luong, T., Hoffmann, R., Sander, C., Valencia, A.: Introducing meta-services for biomedical information extraction. Genome Biol. 9(suppl. 2), S6 (2008)CrossRefGoogle Scholar
  19. 19.
    Goble, C.A., Belhajjame, K., Tanoh, F., Bhagat, J., Wolstencroft, K., Stevens, R., Pettifer, S., Nzuobontane, E., McWilliam, H., Laurent, T., Lopez, R.: BioCatalogue: a curated Web Service registry for the Life Science community. In: ISMB/ECCB 2009. Technology Track: TT40 (2009)Google Scholar
  20. 20.
    Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., Tarczy-Hornoch, P.: Data integration and genomic medicine. J. Biomed. Inform. 40(1), 5–16 (2007)CrossRefGoogle Scholar
  21. 21.
    Pihur, V., Datta, S., Datta, S.: Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13), 1607–1615 (2007)CrossRefGoogle Scholar
  22. 22.
    DeConde, R., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5, Article 15 (2006)Google Scholar
  23. 23.
    Pihur, V., Datta, S., Datta, S.: RankAggreg, an R package for weighted rank aggregation. BMC Bioinformatics 10, 62 (2009)CrossRefGoogle Scholar
  24. 24.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline operator. In: Proceedings 17th International Conference on Data Engineering, ICDE 2001, pp. 421–430. IEEE Press, New York (2001)Google Scholar
  26. 26.
    Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. J. Mach. Learn. Res. 8, 2727–2754 (2007)MATHGoogle Scholar
  27. 27.
    Cheung, C.W.: Probabilistic rank aggregation for multiple SVM ranking. MPhil Thesis. Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. Hong Kong (2009)Google Scholar
  28. 28.
    Sawaragi, Y., Nakayama, H., Tanino, T.: Theory of multiobjective optimization. Mathematics in Science and Engineering, vol. 176. Academic Press Inc., Orlando (1985)MATHGoogle Scholar
  29. 29.
    Steuer, R.E.: Multiple criteria optimization: theory, computations, and application. John Wiley & Sons, Inc., New York (1986)MATHGoogle Scholar
  30. 30.
    Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, Inc., New York (2002)MATHGoogle Scholar
  31. 31.
    Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. KanGAL Report no. 200001 (2000)Google Scholar
  32. 32.
    Zitzler, E., Thiele, L.: An evolutionary algorithm for multiobjective optimization: the strength Pareto approach. TIK-Report no. 43 (1998)Google Scholar
  33. 33.
    Handl, F., Kell, D.B., Knowles, J.D.: Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 279–292 (2007)CrossRefGoogle Scholar
  34. 34.
    Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31(3), 316–319 (2002)Google Scholar
  35. 35.
    Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)CrossRefGoogle Scholar
  36. 36.
    Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)CrossRefGoogle Scholar
  37. 37.
    Kearsey, M.J.: The principles of QTL analysis (a minimal mathematics approach). J. Exp. Bot. 49(327), 1619–1623 (1998)CrossRefGoogle Scholar
  38. 38.
    Datta, R., de Schoolmeester, M.L., Hedeler, C., Paton, N.W., Brass, A.M., Else, K.J.: Identification of novel genes in intestinal tissue that are regulated after infection with an intestinal nematode parasite. Infect. Immun. 73(7), 4025–4033 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marco Masseroli
    • 1
  • Norman W. Paton
    • 2
  • Irena Spasić
    • 2
  1. 1.Dipartimento di Elettronica e InformatzionePolitecnico di MilanoMilanoItaly
  2. 2.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations