Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs

  • Barna Saha
  • Allison Hoch
  • Samir Khuller
  • Louiqa Raschid
  • Xiao-Ning Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6044)


In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.


Gene Ontology Polynomial Time Bipartite Graph Polynomial Time Algorithm Distance Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4 (2003)Google Scholar
  2. 2.
    Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database issue), 267–270 (2004)CrossRefGoogle Scholar
  3. 3.
    Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families 30(7), 1575–1584 (April 2002)Google Scholar
  5. 5.
    Entrez: the life sciences search engine,
  6. 6.
    Sayers, E.W., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 37(Database issue), D16–D18 (2009)Google Scholar
  7. 7.
    Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)CrossRefGoogle Scholar
  8. 8.
    Margarita, et al.: TAIR: a resource for integrated Arabidopsis data. Functional and Integrative Genomics 2(6), 239 (2002)CrossRefGoogle Scholar
  9. 9.
    Rhee, S.Y., et al.: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Research 31(1), 224–228 (2003)CrossRefGoogle Scholar
  10. 10.
    Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Gene Ontology (GO),
  12. 12.
    Goldberg, A.V.: Finding a maximum density subgraph. Technical report (1984)Google Scholar
  13. 13.
    Kang, B., Grancher, N., Koyffmann, V., Lardemer, D., Burney, S., Ahmad, M.: Multiple interactions between cryptochrome and phototropin blue-light signalling pathways in arabidopsis thaliana. Planta 227(5), 1091–1099 (2008)CrossRefGoogle Scholar
  14. 14.
    Khuller, S., Saha, B.: On finding dense subgraphs. In: ICALP 2009, pp. 597–608 (2009)Google Scholar
  15. 15.
    King, A.D., Przulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20(17), 3013–3020 (2004)CrossRefGoogle Scholar
  16. 16.
    Rhee, S.Y., Reiser, L.: Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Current Protocols in Bioinformatics (2005)Google Scholar
  17. 17.
    Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)zbMATHGoogle Scholar
  18. 18.
    Lee, W.-j., Raschid, L., Sayyadi, H., Srinivasan, P.: Exploiting ontology structure and patterns of annotation to mine significant associations between pairs of controlled vocabulary terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds.) DILS 2008. LNCS (LNBI), vol. 5109, pp. 44–60. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Li, X., Foo, C., Ng, S.: Discovering protein complexes in dense reliable neighborhoods of protein interaction networks 6, 157–168 (2007)Google Scholar
  20. 20.
    Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), 26–31 (2007)CrossRefGoogle Scholar
  21. 21.
    Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C.: Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 400–417. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Newman, M.E.J.: Modularity and community structure in networks 103(23), 8577–8582 (2006)Google Scholar
  23. 23.
    Ohgishi, M., Saji, K., Okada, K., Sakai, T.: Functional analysis of each blue light receptor, cry1, cry2, phot1, and phot2, by using combinatorial multiple mutants in arabidopsis. PNAS 1010(8), 2223–2228 (2004)CrossRefGoogle Scholar
  24. 24.
    Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2004)CrossRefGoogle Scholar
  25. 25.
    Picard, J.-C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Mathematical Programming Study 13, 8–16 (1980)zbMATHMathSciNetGoogle Scholar
  26. 26.
    Plant Ontology (PO),
  27. 27.
  28. 28.
  29. 29.
    Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.: Dense subgraph with restrictions and applications to gene annotation graphs (2010),
  30. 30.
    Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks 100(21), 12123–12128 (October 2003)Google Scholar
  31. 31.
    Unified Medical Language System (UMLS),
  32. 32.
    Yu, H., Paccanaro, A., Trifonov, V., Gerstein, M.: Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22(7), 823–829 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Barna Saha
    • 1
  • Allison Hoch
    • 2
  • Samir Khuller
    • 3
  • Louiqa Raschid
    • 4
  • Xiao-Ning Zhang
    • 5
    • 6
  1. 1.Research supported by NSF Award CCF-0728839 Department of Computer ScienceUniversity of MarylandCollege Park
  2. 2.Research supported by NSF REU Supplement to Award CCF-0728839 Department of Computer ScienceUniversity of MarylandCollege Park
  3. 3.Research supported by NSF Award CCF-0728839 and a Google Research Award Department of Computer Science and UMIACSUniversity of MarylandCollege Park
  4. 4.Research supported by NSF Award IIS-0430915 and IIS-0960963 UMIACS and Robert H. Smith School of BusinessUniversity of MarylandCollege Park
  5. 5.Research supported by Department of BiologySt. Bonaventure UniversitySt. Bonaventure
  6. 6.Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege Park

Personalised recommendations