Mean Square Residue Biclustering with Missing Data and Row Inversions

  • Stefan Gremalschi
  • Gulsah Altun
  • Irina Astrovskaya
  • Alexander Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5542)


Cheng and Church proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds and the missing values in the matrix are replaced with random numbers. In our previous paper we introduced the dual biclustering method with quadratic optimization to missing data and row inversions.

In this paper, we modified the dual biclustering method with quadratic optimization and added three new features. First, we introduce ”row status” for each row in a bicluster where we add and also delete rows from biclusters based on their status in order to find min MSR. We compare our results with Cheng and Church’s approach where they inverse rows while adding them to the biclusters. We select the row or the negated row not only at addition, but also at deletion and show improvement. Second, we give a prove for the theorem introduced by Cheng and Church in [4]. Since, missing data often occur in the given data matrices for biclustering, usually, missing data are filled by random numbers. However, we show that ignoring the missing data is a better approach and avoids additional noise caused by randomness. Since, an ideal bicluster is a bicluster with an H value of zero, our results show a significant decrease of H value of the biclusters with lesser noise compared to original dual biclustering and Cheng and Church method.


Biclustering Mean Square Residue 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angiulli, F., Pizzuti, C.: Gene Expression Biclustering using Random Walk Strategies. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 509–519. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Baldi, P., Hatfield, G.W.: DNA Microarrays and Gene Expression. In: From Experiments to Data Analysis and Modelling. Cambridge Univ. Press, Cambridge (2002)Google Scholar
  3. 3.
    Bertsimas, D., Tsitsiklis, J.: Introduction to Linear Optimization. Athena ScientificGoogle Scholar
  4. 4.
    Cheng, Y., Church, G.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 93–103. AAAI Press, Menlo Park (2000)Google Scholar
  5. 5.
    Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)CrossRefPubMedGoogle Scholar
  6. 6.
    Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity, p. 2982. Prentice-Hall, Inc., Upper Saddle RiverGoogle Scholar
  7. 7.
    Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Bhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzle, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)CrossRefPubMedGoogle Scholar
  8. 8.
  9. 9.
    Tanay, A., Sharan, R., Shamir, R.: Discovering Statistically Significant Biclusters in Gene Expression Data. Bioinformatics 18, 136–144 (2002)CrossRefGoogle Scholar
  10. 10.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)CrossRefPubMedGoogle Scholar
  11. 11.
    Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on gene expression data. In: Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE), pp. 321–327 (2003)Google Scholar
  12. 12.
    Zhang, Y., Zha, H., Chu, C.H.: A time-series biclustering algorithm for revealing co-regulated genes. In: Proc. Int. Symp. Information and Technology: Coding and Computing (ITCC 2005), Las Vegas, USA, pp. 32–37 (2005)Google Scholar
  13. 13.
    Zhou, J., Khokhar, A.A.: ParRescue: Scalable Parallel Algorithm and Implementation for Biclustering over Large Distributed Datasets. In: 26th IEEE International Conference on Distributed Computing Systems, ICDCS 2006 (2006)Google Scholar
  14. 14.
    Gremalschi, S., Altun, G.: Mean Squared Residue Based Biclustering Algorithms. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 232–243. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Divina, F., Aguilar, J.: Ruiz Biclustering of Expression Data with Evolutionary Computation. IEEE Transactions on Knowledge and Data Engineering 18(5), 590–602 (2006)CrossRefGoogle Scholar
  16. 16.
    Yang, J., Wang, W., Wang, H., Yu, P.S.: Enhanced biclustering on expression data. In: Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE 2003), pp. 321–327 (2003)Google Scholar
  17. 17.
    Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Bhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics 22(9), 1122–1129 (2006)CrossRefPubMedGoogle Scholar
  18. 18.
    Xiao, J., Wang, L., Liu, X., Jiang, T.: An Efficient Voting Algorithm for Finding Additive Biclusters with Random Background. Journal of Computational Biology 15(10), 1275–1293 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Liu, X., Wang, L.: Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1), 50–56 (2007)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Stefan Gremalschi
    • 1
  • Gulsah Altun
    • 2
  • Irina Astrovskaya
    • 1
  • Alexander Zelikovsky
    • 1
  1. 1.Department of Computer ScienceGeorgia State UniversityAtlanta
  2. 2.Department of Reproductive MedicineUniversity of CaliforniaSan Diego

Personalised recommendations