Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data

  • Edgars Celms
  • Kārlis Čerāns
  • Kārlis Freivalds
  • Paulis Ķikusts
  • Lelde Lāce
  • Gatis Melkus
  • Mārtiņš Opmanis
  • Dārta Rituma
  • Pēteris Ručevskis
  • Juris VīksnaEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 838)


In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.

We demonstrate the applications of these algorithms to two concrete use cases: (1) analysis of protein coexpression in colorectal cancer cell lines; and (2) protein homology identification from, both sequence and structural similarity, data.


Clustering algorithms Graph visualization Biomolecular networks Bioinformatics 



The research was supported by ERDF project


  1. 1.
    Boccaletti, S., et al.: The structure and dynamics of multilayer networks. Phys. Rep. 544, 1–122 (2014)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Choudhari, J., et al.: Genomic determinants of protein abundance variation in colorectal cancer cells. Cell Rep. 20, 2201–2214 (2017)CrossRefGoogle Scholar
  3. 3.
    Enright, A., et al.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)CrossRefGoogle Scholar
  4. 4.
    Fortunato, A.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Freivalds, K., Dogrusoz, U., Kikusts, P.: Disconnected graph layout and the polyomino packing approach. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 378–391. Springer, Heidelberg (2002). Scholar
  6. 6.
    Freivalds, K., Glagoļevs, J.: Graph compact orthogonal layout algorithm. In: Fouilhoux, P., Gouveia, L.E.N., Mahjoub, A.R., Paschos, V.T. (eds.) ISCO 2014. LNCS, vol. 8596, pp. 255–266. Springer, Cham (2014). Scholar
  7. 7.
    Grishin, N.: Fold change in evolution of protein structures. Struct. Biol. 134, 167–185 (2001)CrossRefGoogle Scholar
  8. 8.
    Higgins, D., Sievers, F.: Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol. Biol. 1079, 105–116 (2014)CrossRefGoogle Scholar
  9. 9.
    Higgins, D., et al.: ClustalW and ClustalX version 2.0. Bioinformatics 23, 2947–2948 (2007)CrossRefGoogle Scholar
  10. 10.
    Jonsson, P., et al.: Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinform. 7(1), 2 (2006)CrossRefGoogle Scholar
  11. 11.
    Kurbatova, N., Mancinska, L., Viksna, J.: Protein structure comparison based on fold evolution. Lect. Notes Inform. 115, 78–89 (2007)Google Scholar
  12. 12.
    Kurbatova, N., Viksna, J.: Exploration of evolutionary relations between protein structures. Commun. Comput. Inf. Sci. 13, 154–166 (2008)Google Scholar
  13. 13.
    Langfelder, P., Horwath, S.: WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008)CrossRefGoogle Scholar
  14. 14.
    Maddi, A., Eslahchi, C.: Discovering overlapped protein complexes from weighted PPI networks by removing inter-module hubs. Sci. Rep. 7, 3247 (2017)CrossRefGoogle Scholar
  15. 15.
    Nepusz, T., Yu, H., Paccanaro, A.: Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods 9, 471–472 (2012)CrossRefGoogle Scholar
  16. 16.
    Orengo, C., et al.: New functional families in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 44, 490–498 (2013)Google Scholar
  17. 17.
    Pearson, R.: Effective protein sequence comparison. Methods Enzymol. 266, 227–258 (1996)CrossRefGoogle Scholar
  18. 18.
    Petryszak, R., et al.: Expression Atlas update - an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44(D1), 746–752 (2016)CrossRefGoogle Scholar
  19. 19.
    Pirim, H., Eksioglu, B., Perkins, A.: Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic. Comput. Biol. Med. 62, 94–102 (2015)CrossRefGoogle Scholar
  20. 20.
    Rung, J., Schlitt, T., Brazma, A., Freivalds, K., Vilo, J.: Building and analysing genome-wide gene disruption networks. Bioinformatics 18, S202–S210 (2002)CrossRefGoogle Scholar
  21. 21.
    Schaeffer, S.: Graph clustering. Comput. Sci. Rev. 1, 27–64 (2007)CrossRefGoogle Scholar
  22. 22.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefGoogle Scholar
  23. 23.
    Traag, A., Doreian, P., Mrvar, A.: Partitioning signed networks. ArXiv e-prints abs/1803.02082 (2018)
  24. 24.
    van Dongen, S., Abreu-Goodger, C.: Using MCL to extract clusters from networks. In: van Helden, J., Toussaint, A., Thieffry, D. (eds.) Bacterial Molecular Networks. Methods in Molecular Biology (Methods and Protocols), vol. 804, pp. 281–295. Springer, New York (2012). Scholar
  25. 25.
    Vihrovs, J., Prusis, K., Freivalds, K., Rucevskis, P., Krebs, V.: A potential field function for overlapping point set and graph cluster visualization. Commun. Comput. Inf. Sci. 550, 136–152 (2015)Google Scholar
  26. 26.
    Viksna, J., Gilbert, D.: Assessment of the probabilities for evolutionary structural changes in protein folds. Bioinformatics 23, 832–841 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Edgars Celms
    • 1
  • Kārlis Čerāns
    • 1
  • Kārlis Freivalds
    • 1
  • Paulis Ķikusts
    • 1
  • Lelde Lāce
    • 1
  • Gatis Melkus
    • 1
  • Mārtiņš Opmanis
    • 1
  • Dārta Rituma
    • 1
  • Pēteris Ručevskis
    • 1
  • Juris Vīksna
    • 1
    Email author
  1. 1.Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLatvia

Personalised recommendations