Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering

  • Mehmet Deveci
  • Onur Küçüktunç
  • Kemal Eren
  • Doruk Bozdağ
  • Kamer Kaya
  • Ümit V. ÇatalyürekEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1375)


Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.


Biclustering Microarray Gene expression Clustering 


  1. 1.
    Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: The order-preserving submatrix problem. In: Proceedings of the International Conference on Computational Biology, pp 49–57Google Scholar
  2. 2.
    Jiang D, Pei J, Zhang A (2003) DHC: a density-based hierarchical clustering method for time series gene expression data. In: Proceedings IEEE Symposium on BioInformatics and Bioengineering, pp 393–400Google Scholar
  3. 3.
    Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45CrossRefPubMedGoogle Scholar
  4. 4.
    Pujana MA, Han J-DJ, LM Starita, Stevens KN, Tewari M, Ahn JS, Rennert G, Moreno V, Kirchhoff T, Gold B, Assmann V, ElShamy WM, Rual J-F, Levine D, Rozek LS, Gelman RS, Gunsalus KC, Greenberg RA, Sobhian B, Bertin N, Venkatesan K, Ayivi-Guedehoussou N, Sole X, Hernandez P, Lazaro C, Nathanson KL, Weber BL, Cusick ME, Hill DE, Offit K, Livingston DM, Gruber SB, Parvin JD, Vidal M (2007) Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39(11):1338–1349CrossRefPubMedGoogle Scholar
  5. 5.
    Owen AB, Stuart J, Mach K, Villeneuve AM, Kim S (2003) A gene recommender algorithm to identify coexpressed genes in C. elegans. Genome Res 13(8):1828–1837PubMedPubMedCentralGoogle Scholar
  6. 6.
    Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG (2007) Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23:2692–2699CrossRefPubMedGoogle Scholar
  7. 7.
    Dhollander T, Sheng Q, Lemmens K, De Moor B, Marchal K, Moreau Y (2007) Query-driven module discovery in microarray data. Bioinformatics 23:2573–2580CrossRefPubMedGoogle Scholar
  8. 8.
    Adler P, Kolde R, Kull M, Tkachenko A, Peterson H, Reimand J, Vilo J (2009) Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome Biol 10:R139CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Bozdağ D, Parvin JD, Çatalyürek ÜV (2009) A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Proceedings of 1st International Conference on Bioinformatics and Computational Biology, pp 151–163Google Scholar
  10. 10.
    Zhao H, Cloots L, Van den Bulcke T, Wu Y, De Smet R, Storms V, Meysman P, Engelen K, Marchal K (2011) Query-based biclustering of gene expression data using probabilistic relational models. BMC Bioinf 12(Suppl 1):S37CrossRefGoogle Scholar
  11. 11.
    Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, pp 93–103Google Scholar
  12. 12.
    Segal E, Taskar B, Gasch A, Friedman N, Koller D (2001) Rich probabilistic models for gene expression. Bioinformatics 17(suppl_1):S243–S252Google Scholar
  13. 13.
    Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of ACM SIGMODCrossRefGoogle Scholar
  14. 14.
    Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Tech. Rep., Stanford UniversityGoogle Scholar
  15. 15.
    Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39(12):2464–2477CrossRefGoogle Scholar
  16. 16.
    Mejía-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vázquez M, Yang XY, García C, Tirado F, Pascual-Montano A (2008) bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res 36(suppl 2):W523–W528CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Gu J, Liu JS (2008) Bayesian biclustering of gene expression data. BMC Genomics 9(Suppl 1):S4CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W et al (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12):1520–1527CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Painsky A, Rosset S (2012) Exclusive row biclustering for gene expression using a combinatorial auction approach. In: Proceedings of the 2012 I.E. 12th International Conference on Data Mining, pp 1056–1061. IEEE Computer SocietyGoogle Scholar
  20. 20.
    Joung J-G, Kim S-J, Shin S-Y, Zhang B-T (2012) A probabilistic coevolutionary biclustering algorithm for discovering coherent patterns in gene expression dataset. BMC Bioinf 13(Suppl 17):S12CrossRefGoogle Scholar
  21. 21.
    Flores JL, Inza I, Larrañaga P, Calvo B (2013) A new measure for gene expression biclustering based on non-parametric correlation. Comput Methods Prog Biomed 112(3):367–397CrossRefGoogle Scholar
  22. 22.
    Sun P, Speicher NK, Röttger R, Guo J, Baumbach J (2014) Bi-force: large-scale bicluster editing and its application to gene expression data biclustering. Nucleic Acids Res. doi: 10.1093/nar/gku201 Google Scholar
  23. 23.
    Chakraborty A (2005) Biclustering of gene expression data by simulated annealing. In: Proceedings of Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005, pp 627–632Google Scholar
  24. 24.
    Liew AW-C, Law N-F, Yan H (2011) Recent patents on biclustering algorithms for gene expression data analysis. Recent Pat DNA Gene Seq 5(2):117–125CrossRefPubMedGoogle Scholar
  25. 25.
    Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: Proceedings of the 7th International Conference on Advanced Data Mining and Applications - Volume Part I, ADMA’11, pp 190–200. Springer, Berlin/HeidelbergGoogle Scholar
  26. 26.
    An J, Liew AW-C, Nelson CC (2012) Seed-based biclustering of gene expression data. PLoS ONE 7:e42431, 08Google Scholar
  27. 27.
    Kiraly A, Abonyi J, Laiho A, Gyenesei A (2012) Biclustering of high-throughput gene expression data with bicluster miner. In: IEEE 12th International Conference on Data Mining Workshops (ICDMW), 2012, pp 131–138Google Scholar
  28. 28.
    Liu J, Wang J, Wang W (2004) Biclustering in gene expression data by tendency. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 182–193. IEEE Computer SocietyGoogle Scholar
  29. 29.
    Liu J, Wang J, Wang W (2004) Gene ontology friendly biclustering of expression profiles. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 436–447. IEEE Computer SocietyGoogle Scholar
  30. 30.
    Madeira S, Oliveira A (2005) A linear time biclustering algorithm for time series gene expression data. In: Casadio R, Myers G (eds) Algorithms in bioinformatics. Lecture Notes in Computer Science, vol 3692, pp 39–52, Springer, Berlin/HeidelbergCrossRefGoogle Scholar
  31. 31.
    Pontes B, Giraldéz R, Aguilar-Ruiz JS (2013) Configurable pattern-based evolutionary biclustering of gene expression data. Algorithms Mol Biol 8:4CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Yang W-H, Dai D-Q, Yan H (2011) Finding correlated biclusters from gene expression data. IEEE Trans Knowl Data Eng 23:568–584CrossRefGoogle Scholar
  33. 33.
    Yoon S, Nardini C, Benini L, De Micheli G (2005) Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans Comput Biol Bioinf 2:339–354CrossRefGoogle Scholar
  34. 34.
    Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. Inf Sci 178(6):1479–1497CrossRefGoogle Scholar
  35. 35.
    Bryan K (2005) Biclustering of expression data using simulated annealing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS’05, (Washington, DC, USA), pp 383–388. IEEE Computer SocietyGoogle Scholar
  36. 36.
    Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. Trans Inf Tech Biomed 10:519–525CrossRefGoogle Scholar
  37. 37.
    Bleuler S, Prelic A, Zitzler E (2004) An EA framework for biclustering of gene expression data. In: Congress on Evolutionary Computation, 2004 (CEC2004), vol 1, pp 166–173Google Scholar
  38. 38.
    Divina F, Aguilar-Ruiz J (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18:590–602CrossRefGoogle Scholar
  39. 39.
    Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2010) Correlation-based scatter search for discovering biclusters from gene expression data. In: Proceedings of the 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, EvoBIO’10, pp 122–133. Springer, Berlin/HeidelbergCrossRefGoogle Scholar
  40. 40.
    Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2011) A comparative analysis of biclustering algorithms for gene expression data. BioData Mining 4:3CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Erten C, Sözdinler M (2009) Biclustering expression data based on expanding localized substructures. In: Rajasekaran S (ed) Bioinformatics and computational biology. Lecture Notes in Computer Science, vol 5462, pp 224–235. Springer, Berlin/HeidelbergGoogle Scholar
  42. 42.
    Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(Supplement 1):136–144Google Scholar
  43. 43.
    Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlinear Soft Matter Phys 67:031902CrossRefGoogle Scholar
  44. 44.
    Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4):703–716CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129CrossRefPubMedGoogle Scholar
  46. 46.
    Li G, Ma Q, Tang H, Paterson AH, Xu Y (2009) QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res 37(15):e101CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG, Coller HA (2009) Detailing regulatory networks through large scale data integration. Bioinformatics 25:3267–3274CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Voggenreiter O, Bleuler S, Gruissem W (2012) Exact biclustering algorithm for the analysis of large gene expression data sets. BMC Bioinf 13(Suppl 18):A10Google Scholar
  49. 49.
    Bryan K, Cunningham P (2006) Bottom-up biclustering of expression data. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2006 (CIBCB ’06), pp 1–8Google Scholar
  50. 50.
    Murali T, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88Google Scholar
  51. 51.
    Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. In: Proceedings of IEEE International Conference on Data Mining, p 187Google Scholar
  52. 52.
    Freitas AV, Ayadi W, Elloumi M, Oliveira J, Oliveira J, Hao J-K (2013) Survey on biclustering of gene expression data, pp 591–608. Wiley, New YorkGoogle Scholar
  53. 53.
    Bozdağ D, Kumar A, Çatalyürek ÜV (2010) Comparative Analysis of Biclustering Algorithms. In: ACM International Conference on Bioinformatics and Computational BiologyCrossRefGoogle Scholar
  54. 54.
    Chia BKH, Karuturi RKM (2010) Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms. Algorithms Mol Biol 5(1):8CrossRefGoogle Scholar
  55. 55.
    Eren K, Deveci M, Küçüktunç O, Çatalyürek ÜV (2012) A comparative analysis of biclustering algorithms for gene expression data. Brief BioinformGoogle Scholar
  56. 56.
    Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E (2014) Biclustering methods: Biological relevance and application in gene expression analysis. PloS one 9(3):e90801CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Bhattacharya A, De RK (2009) Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics 25(21):2795–2801CrossRefPubMedGoogle Scholar
  58. 58.
    Casella G, Wells MT (1993) Is Pitman closeness a reasonable criterion: comment. J Am Stat Assoc 88(421):70–71Google Scholar
  59. 59.
    Mian O, Wang S, Zhu S, Gnanapragasam M, Graham L, Bear H, Ginder G (2011) Methyl-binding domain protein 2-dependent proliferation and survival of breast cancer cells. Mol Cancer Res 9(8):1152–62CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Kioulafa M, Kaklamanis L, Stathopoulos E, Mavroudis D, Georgoulias V, Lianidou ES (2009) Kallikrein 10 (KLK10) methylation as a novel prognostic biomarker in early breast cancer. Ann Oncol 20:1020–1025CrossRefPubMedGoogle Scholar
  61. 61.
    Dorszewska J, Florczak J, Rozycka A, Jaroszewska-Kolecka J, Trzeciak WH, Kozubski W (2005) Polymorphisms of the CHRNA4 gene encoding the alpha4 subunit of nicotinic acetylcholine receptor as related to the oxidative DNA damage and the level of apoptotic proteins in lymphocytes of the patients with Alzheimer’s disease. DNA Cell Biol 24:786–794CrossRefPubMedGoogle Scholar
  62. 62.
    Zhang L, Farrell JJ, Zhou H, Elashoff D, Akin D, Park N-H, Chia D, Wong DT (2010) Salivary transcriptomic biomarkers for detection of resectable pancreatic cancer. Gastroenterology 138(3):949–957, e1–7CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Lindahl M, Poteryaev D, Yu L, Arumae U, Timmusk T, Bongarzone I, Aiello A, Pierotti MA, Airaksinen MS, Saarma M (2001) Human glial cell line-derived neurotrophic factor receptor alpha 4 is the receptor for persephin and is predominantly expressed in normal and malignant thyroid medullary cells. J Biol Chem 276:9344–9351CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Mehmet Deveci
    • 1
  • Onur Küçüktunç
    • 1
  • Kemal Eren
    • 1
  • Doruk Bozdağ
    • 2
  • Kamer Kaya
    • 3
  • Ümit V. Çatalyürek
    • 4
    Email author
  1. 1.Computer Science and EngineeringThe Ohio State UniversityColumbusUSA
  2. 2.Biomedical InformaticsThe Ohio State UniversityColumbusUSA
  3. 3.Computer Science and EngineeringSabancı UniversityIstanbulTurkey
  4. 4.Biomedical Informatics, Department of Electrical and Computer EngineeringThe Ohio State UniversityColumbusUSA

Personalised recommendations