Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases

  • Greeshma Neglur
  • Robert L. Grossman
  • Natalia Maltsev
  • Clement Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4075)


This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.


Common Edge Label Graph Query Graph Adjacency List Quotient Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bader, G.D., Cary, M.P., Sander, C.: Pathguide: a pathway resource list. Nucleic Acids Res. 34(Database issue), D504–D506 (2006)CrossRefGoogle Scholar
  2. 2.
    KEGG - Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M.: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006)CrossRefGoogle Scholar
  3. 3.
    Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305 (2000)CrossRefGoogle Scholar
  4. 4.
    Schomburg, I., Chang, A., Schomburg, D.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30, 7–9 (2002)CrossRefGoogle Scholar
  5. 5.
    Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., Karp, P.D.: MetaCyc: A Multiorganism Database of Metabolic Pathways and Enzymes. Nucleic Acids Research 32(1), D438–D442 (2004)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3-4), 255–259 (1998)MATHCrossRefGoogle Scholar
  8. 8.
    Chen, M., Hofestaedt, R.: PathAligner: Metabolic Pathway Retrieval and Alignment. Applied Bioinformatics 3(4), 241–252 (2004)CrossRefGoogle Scholar
  9. 9.
    Pinter, R., et al.: Tree-based Comparison of Metabolic PathwaysGoogle Scholar
  10. 10.
    Metabolic Pathway Search Engine,
  11. 11.
    Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. J. Comput. Biol. 6(3-4), 343–360 (1999)CrossRefGoogle Scholar
  12. 12.
    EC-Published in Enzyme Nomenclature. Academic Press, San Diego, California (1992), ISBN 0-12-227164-5 (hardback), 0-12-227165-3 (paperback) with Supplement 1 (1993), Supplement 2 (1994), Supplement 3 (1995), Supplement 4 (1997), Supplement 5 (in Eur. J.Biochem. 223, 1–5 (1994), Eur. J. Biochem. 232, 1–6 (1995), Eur. J. Biochem. 237, 1–5 (1996), Eur. J. Biochem. 250, 1–6 (1997), Eur. J. Biochem. 264, 610–650 (1999) respectively) (Copyright IUBMB)Google Scholar
  13. 13.
    Lerdorf, R., Tatroe, K.: Programming PHP (Published: 05/04/2002) ISBN 1565926102Google Scholar
  14. 14.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. Section 22.3 Depth First SearchGoogle Scholar
  15. 15.
    Grossman, R.L., Kasturi, P., Hamelberg, D., Liu, B.: An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds. Journal of Bioinformatics and Computational Biology 2(1), 155–171 (2004)CrossRefGoogle Scholar
  16. 16.
    Neglur, G., Grossman, R.L., Liu, B.: Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 145–157. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Kelley, B.P., Sharan, R., Karp, R., Sittler, E.T., Root, D.E., Stockwell, B.R., Ideker, T.: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)CrossRefGoogle Scholar
  18. 18.
    Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, (Web Server issue), W83–W88 (2004)CrossRefGoogle Scholar
  19. 19.
    Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 8, 102(6), 1974–1979 (2005)CrossRefGoogle Scholar
  20. 20.
    Goldman, R., Widom, J.: Dataguides:enabling query formulation and optimization in semistructured databases. In: Proceedings of VLDB, pp. 436–445 (1997)Google Scholar
  21. 21.
    Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for XML data. In: SIGMOD, pp. 121–132 (2002)Google Scholar
  22. 22.
    Schenkel, R., Theobald, A., Weikum, G.: Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections, icde. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 360–371 (2005)Google Scholar
  23. 23.
    Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Symposium on Principles of Database Systems, pp. 39–52 (2002)Google Scholar
  24. 24.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Proceedings of SIGMOD 2004 (2004)Google Scholar
  25. 25.
    James, C.A., Weininger, D., Delany, J.: Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc. (2003)Google Scholar
  26. 26.
    Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov Jr, E., Yunus, I.: The metabolic pathway collection from EMP: the enzymes and metabolic pathways database. Nucleic Acids Res. 24(1), 26–28 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Greeshma Neglur
    • 1
  • Robert L. Grossman
    • 1
  • Natalia Maltsev
    • 2
  • Clement Yu
    • 3
  1. 1.Laboratory for Advanced ComputingUniversity of Illinois at ChicagoChicagoUSA
  2. 2.Argonne National LaboratoryMath and Computer Science DivisionArgonneUSA
  3. 3.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations