An Evaluation of the Use of XML for Representation, Querying, and Analysis of Molecular Interactions

  • Lena Strömbäck
  • David Hall
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4254)


Currently, biology researchers rapidly generate new information on how genes, proteins and other molecules interact in living organisms. To completely understand the machinery underlying life it is necessary to integrate and analyze these large quantities of data. As one step in this direction, new standards for describing molecular interactions have been defined based on XML. This work evaluates the usage of the XML Query language XQuery for molecular interactions, as it would be of great benefit to the user to work directly on data represented in the new standards. We use and compare a set of available XQuery implementations, eXist, X-Hive, Sedna and QizX/open for querying and analysis on data exported from available databases. Our conclusion is that XQuery can easily be used for the most common queries in this domain but is not feasible for more complex analyses. In particular, for queries containing path analysis the available XQuery implementations have poor performance and an extension of the GTL package clearly outperforms XQuery. The paper ends with a discussion regarding the usability of XQuery in this domain. In particular we point out the need for more efficient graph handling and that XQuery also requires the user to understand the exact XML format of each dataset.


System Biology Markup Language System Biology Markup Language Model XQuery Query Molecular Interaction Database Human Proteome Organization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achard, F., Vaysseix, G., Barillot, E.: XML, bioinformatics and data integration. Bioinformatics 17(2), 115–125 (2001)CrossRefGoogle Scholar
  2. 2.
    Bader, G.D., Donaldson, I., Wolting, C., Oulette, B.F., Tony, B.F., Pawson, T.B.F., Hogue, C.W.V.: BIND - The Biomolecular Network Database. Nucleic Acids Research 29(1), 242–245 (2001)CrossRefGoogle Scholar
  3. 3.
    BioPAX working Group: BioPAX – Biological Pathways Exchange Language. Level 1, Version 1.0 Documentation (2004), Available at:
  4. 4.
    Bressan, S., Lee, M.-L., Li, Y.G., Lacroix, Z., Nambiar, U.: The XOO7 benchmark. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 146–147. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S.: A vision for the future of genomics research: A blueprint for the genomic era. Nature 422, 835–847 (2003)CrossRefGoogle Scholar
  6. 6.
    Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A brief Survey. AI Magazine, Special Issue on Semantic Integration, Spring 2005 (2004)Google Scholar
  7. 7.
    Hall, D.: An XML-based Database of Molecular Pathways. Master Thesis, Department of Computer and Information Science, Linköpings universitet, LITH-IDA-EX–05/049–SE (2005)Google Scholar
  8. 8.
    Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., Moore, S., Orchard, S., Sarkans, U., von Mering, C., Roechert, B., Poux, S., Jung, E., Mersch, H., Kersey, P., Lappe, M., Li, Y., Zeng, R., Rana, D., Nikolski, M., Husi, H., Brun, C., Shanker, K., Grant, S.G.N., Sander, C., Boork, P., Zhu, W., Akhilesh, P., Brazma, A., Jacq, B., Vidal, M., Sherman, D., Legrain, P., Cesareni, G., Xenarios, I., Eisenberg, D., Steipe, B., Hogue, C., Apweiler, R.: The HUPO PSI’s Molecular Interaction format - a community standard for the representation of protein interaction data. Nature Biotechnology 22(2), 177–183 (2004)CrossRefGoogle Scholar
  9. 9.
    Hermjakob, H., Montecchi-Palazzi, L., Lewington, C., Mudali, S., Kerrien, S., Orchard, S., Vingron, M., Roechert, B., Roepstorff, P., Valencia, A., Margalit, H., Armstrong, J., Bairoch, A., Cesareni, G., Sherman, D., Apweiler, R.: IntAct - an open source molecular interaction database. Nucl. Acids. Res. 32, D452–D455 (2004)Google Scholar
  10. 10.
    Hucka, M., Finney, A., Sauro, H.M., Bolouri, H., Doyle, J.C., Kitano, H., Arkin, A.P., Bornstein, B.J., Bray, D., Cornish-Bowden, A., Cuellar, A.A., Dronov, S., Gilles, E.D., Ginkel, M., Gor, V., Goryanin II, Hedley, W.-J., Hodgman, T.C., Hofmeyr, J.-H., Hunter, P.J., Juty, N.S., Kasberger, J.L., Kremling, A., Kummer, U., Le Novere, N., Loew, L.M., Lucio, D., Mendes, P., Minch, E., Mjolsness, E.D., Nakayama, Y., Nelson, M.R., Nielsen, P.F., Sakurada, T., Schaff, J.C., Shapiro, B.E., Shimizu, T.S., Spence, H.D., Stelling, J., Takahashi, K., Tomita, M., Wagner, J., Wang, J.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)CrossRefGoogle Scholar
  11. 11.
    Jakoniene, V., Lambrix, P.: Ontology-based Integration for Bioinformatics. In: Proceedings of the VLDB Workshop on Ontologies-based techniques for DataBases and Information Systems - ODBIS 2005, Trondheim, Norway, pp. 55–58 (2005)Google Scholar
  12. 12.
    Joshi-Tope, G., Gillespie, M., Vastrik, I., D’Eustachio, P., Schmidt, E., de Bono, B., Jassal, B., Gopinath, G.R., Wu, G.R., Matthews, L., Lewis, S., Birney, E., Stein, L.: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 1(33), D428–432 (2005)Google Scholar
  13. 13.
    Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)CrossRefGoogle Scholar
  14. 14.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)Google Scholar
  15. 15.
    Karp, P.D., Arnaud, M., Collado-Vides, J., Ingraham, J., Paulsen, I.T., Saier Jr., M.H.: The E. coli EcoCyc Database: No Longer Just a Metabolic Pathway Database. ASM News 70(1), 25–30 (2004)Google Scholar
  16. 16.
    McEntire, R., Karp, P., Abrenethy, N., Benton, D., Helt, G., DeJongh, M., Kent, R., Kosky, A., Lewis, S., Hodnett, D., Neumann, E., Olken, F., Pathak, D., Tarzy-Hornoch, P., Tolda, L., Topaloglou, T.: An evaluation of Ontology Exchange Languages for Bioinformatics. In: Proceedings International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 239–250 (2000)Google Scholar
  17. 17.
    Nambiar, U., Lacroix, Z., Bressan, S., Lee, M.-L., Li, Y.G.: Current Approaches to XML Management. IEEE Internet Computing 6(4), 43–51 (2002)CrossRefGoogle Scholar
  18. 18.
    Punin, J., Krishnamoorthy, M.: XGMML (eXtensible Graph Markup and Modeling Language) (2001), (accessed, April 2005)
  19. 19.
    Rahm, E., Bernstein, P.A.: A survey of approaches to semantic schema matching. The VLDB Journal 10, 334–350 (2001)zbMATHCrossRefGoogle Scholar
  20. 20.
    Salvinski, L., Miller, C.S., Smith, A.J., Bowie, J.U., Eisenberg, D.: The Database of Interacting Proteins: 2004 Update. Nucleic Acids Research 32, Database Issue, D449–D451 (2004)Google Scholar
  21. 21.
    Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Carey, M.J., Manolescu, I., Busse, R.: Why and How to Benchmark XML Databases. ACM SIGMOD Record 3(30), 27–32 (2001)CrossRefGoogle Scholar
  22. 22.
    Strömbäck, L.: Storage and Integration of Molecular Interactions: Evaluating the Use of XML Technology. In: Proc of the 3rd International Workshop on Biological Data Management (BIDM 2005), Copenhagen, Denmark (2005)Google Scholar
  23. 23.
    Strömbäck, L.: XML representations of pathway data: a comparison. In: Proc. of the ACM SIGIR 2004 Workshop on Search and Discovery within Bioinformatics, Sheffield UK (2004)Google Scholar
  24. 24.
    Strömbäck, L., Lambrix, P.: Representations of molecular pathways: An evaluation of SBML, PSI MI and BioPAX. Bioinformatics 21(24), 4401–4407 (accepted, 2005)Google Scholar
  25. 25.
    Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench Benchmark and Performance Testing of XML DBMSs. In: Proceedings of 20th International Conference on Data Engineering, Boston, MA, March 2004, pp. 621–632 (2004)Google Scholar
  26. 26.
    Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M., Cesareni, G.: MINT: a Molecular INTeraction database. FEBS Letters 513(1), 135–140 (2002)CrossRefGoogle Scholar
  27. 27.
    eXist (2005), (accessed May 2005)
  28. 28.
    GTL The Graph Template Library (2005), (accessed May 2005)
  29. 29.
    QizX/open (2005), (accessed May 2005)
  30. 30.
    Sedna (2005), (accessed May 2005)
  31. 31.
    X-Hive (2005), (accessed May 2005)
  32. 32.
    XQuery (2005), (accessed May 2005)

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lena Strömbäck
    • 1
  • David Hall
    • 1
  1. 1.Department of Computer and Information ScienceLinköping UniversitySweden

Personalised recommendations