Journal of Global Optimization

, Volume 47, Issue 3, pp 343–354

A network flow model for biclustering via optimal re-ordering of data matrices

  • Peter A. DiMaggioJr.
  • Scott R. McAllister
  • Christodoulos A. Floudas
  • Xiao-Jiang Feng
  • Joshua D. Rabinowitz
  • Herschel A. Rabitz
Article

Abstract

The analysis of large-scale data sets using clustering techniques arises in many different disciplines and has important applications. Most traditional clustering techniques require heuristic methods for finding good solutions and produce suboptimal clusters as a result. In this article, we present a rigorous biclustering approach, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix. The physical permutations of the rows and columns are accomplished via a network flow model according to a given objective function. This optimal re-ordering model is used in an iterative framework where cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices. The performance of OREO is demonstrated on metabolite concentration data to validate the ability of the proposed method and compare it to existing clustering methods.

Keywords

Biclustering Mixed-integer linear optimization (MILP) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderberg M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)Google Scholar
  2. 2.
    Jain, A.K., Flynn, P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds.) Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)Google Scholar
  3. 3.
    Salton G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)CrossRefGoogle Scholar
  4. 4.
    Eisen M.B., Spellman P.T., Brown P.O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  5. 5.
    Zhang Y., Skolnick J.: SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)CrossRefGoogle Scholar
  6. 6.
    Mönnigmann M., Floudas C.A.: Protein loop structure prediction with flexible stem geometries. Protein: Struct. Funct. Bioinform. 61, 748–762 (2005)CrossRefGoogle Scholar
  7. 7.
    Edwards A.W.F., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)CrossRefGoogle Scholar
  8. 8.
    Wolfe J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)CrossRefGoogle Scholar
  9. 9.
    Jain A.K., Mao J.: Artificial neural networks: a tutorial. IEEE Comput. 29, 31–44 (1996)Google Scholar
  10. 10.
    Klein R.W., Dubes R.C.: Experiments in projection and clustering by simulated annealing. Pattern Recognit. 22, 213–220 (1989)CrossRefGoogle Scholar
  11. 11.
    Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)Google Scholar
  12. 12.
    Bhuyan, J.N., Raghavan, V.V., Venkatesh, K.E.: Genetic algorithm for clustering with an ordered representation. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 408–415 (1991)Google Scholar
  13. 13.
    Slonim N., Atwal G.S., Tkacik G., Bialek W.: Information-based clustering. Proc. Natl. Acad. Sci. USA 102(51), 18297–18302 (2005)CrossRefGoogle Scholar
  14. 14.
    Tan M.P., Broach J.R., Floudas C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)CrossRefGoogle Scholar
  15. 15.
    Tan M.P., Broach J.R., Floudas C.A.: Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning. J. Bioinform. Comput. Biol. 5(4), 895–913 (2007)CrossRefGoogle Scholar
  16. 16.
    Tan M.P., Smith E.R., Broach J.R., Floudas C.A.: Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Biol. 9, 268–283 (2008)Google Scholar
  17. 17.
    Busygin S., Prokopyev O.A., Pardalos P.M.: An optimization based approach for data classification. Optim. Methods Softw. 22(1), 3–9 (2007)CrossRefGoogle Scholar
  18. 18.
    Lenstra J.K.: Clustering a data array and the traveling-salesman problem. Oper. Res. 22(2), 413–414 (1974)CrossRefGoogle Scholar
  19. 19.
    Lenstra J.K., Rinnooy Kan A.H.G.: Some simple applications of the traveling-salesman problem. Oper. Res. Q 26(4), 717–733 (1975)CrossRefGoogle Scholar
  20. 20.
    Alpert C.J., Kahng A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)CrossRefGoogle Scholar
  21. 21.
    Climer S., Zhang W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. Res. 7, 919–943 (2006)Google Scholar
  22. 22.
    Turner H.L., Bailey T.C., Krzanowski W.J., Hemingway C.A.: Biclustering models for structured microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 316–329 (2005)CrossRefGoogle Scholar
  23. 23.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. ISMB 2000, pp. 93–103 (2000)Google Scholar
  24. 24.
    Reiss D.J., Baliga N.S., Bonneau R.: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform. 7, 280–302 (2006)CrossRefGoogle Scholar
  25. 25.
    Kluger Y., Basri R., Chang J.T., Gerstein M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13, 703–716 (2003)CrossRefGoogle Scholar
  26. 26.
    Prelic A., Bleuler S., Zimmermann P., Wille A., Buhlmann P., Gruissem W., Hennig L., Thiele L., Zitzler E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)CrossRefGoogle Scholar
  27. 27.
    Tanay A., Sharan R., Shamir R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)Google Scholar
  28. 28.
    Yoon S., Nardini C., Benini L., Micheli G.: Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 339–354 (2005)CrossRefGoogle Scholar
  29. 29.
    Bleuler, S., Prelic, A., Zitzler, E.: An EA framework for biclustering of gene expression data. In: IEEE Congress on Evolutionary Computation, pp. 166–173 (2004)Google Scholar
  30. 30.
    Divina F., Aguilar J.: Biclustering of expression data with evolutionary computation. Trans. Knowl. Data Eng. 18(5), 590–602 (2006)CrossRefGoogle Scholar
  31. 31.
    Busygin S., Prokopyev O.A., Pardalos P.M.: Feature selection for consistent biclustering via fractional 0–1 programming. J. Comb. Optim. 10, 7–21 (2005)CrossRefGoogle Scholar
  32. 32.
    Ford L.R., Fulkerson D.R.: Flows in Networks. Princeton University Press, Princeton (1962)Google Scholar
  33. 33.
    Floudas C.A., Grossmann I.E.: Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comput. Chem. Eng. 11(4), 319–336 (1987)CrossRefGoogle Scholar
  34. 34.
    Ciric A.R., Floudas C.A.: A retrofit approach for heat-exchanger networks. Comput. Chem. Eng. 13(6), 703–715 (1989)CrossRefGoogle Scholar
  35. 35.
    Floudas C.A., Anastasiadis S.H.: Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)CrossRefGoogle Scholar
  36. 36.
    Kokossis A.C., Floudas C.A.: Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)CrossRefGoogle Scholar
  37. 37.
    Aggarwal A., Floudas C.A.: Synthesis of general separation sequences—nonsharp separations. Comput. Chem. Eng. 14(6), 631–653 (1990)CrossRefGoogle Scholar
  38. 38.
    CPLEX.: ILOG CPLEX 9.0 User’s Manual (2005)Google Scholar
  39. 39.
    Applegate D.L., Bixby R.E., Chvatal V., Cook W.J.: The traveling salesman problem: a computational study. Princeton University Press, Princeton (2007)Google Scholar
  40. 40.
    Brauer M.J., Yuan J., Bennett B., Lu W., Kimball E., Bostein D., Rabinowitz J.D.: Conservation of the metabolomic response to starvation across two divergent microbes. Proc. Natl. Acad. Sci. USA 103, 19302–19307 (2006)CrossRefGoogle Scholar
  41. 41.
    Ihmels J., Friedlander G., Bergmann S., Sarig O., Ziv Y., Barkai N.: Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370–377 (2002)Google Scholar
  42. 42.
    Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology (RECOMB 2002), Washington, DC, USA, pp. 49–57 (2002)Google Scholar
  43. 43.
    Grothaus G.A., Mufti A., Murali T.M.: Automatic layout and visualization of biclusters. Algorithms Mol. Biol. 1, 1–15 (2006)CrossRefGoogle Scholar
  44. 44.
    Androulakis I.P., Maranas C.D., Floudas C.A.: Prediction of oligopeptide conformations via deterministic global optimization. J. Glob. Optim. 11, 1–34 (1997)CrossRefGoogle Scholar
  45. 45.
    Klepeis J.L., Floudas C.A.: Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)CrossRefGoogle Scholar
  46. 46.
    Klepeis J.L., Floudas C.A., Morikis D., Lambris J.D.: Predicting peptide structures using NMR data and deterministic global optimization. J. Comput. Chem. 20(13), 1354–1370 (1999)CrossRefGoogle Scholar
  47. 47.
    Klepeis J.L., Floudas C.A.: Ab initio tertiary structure prediction of proteins. J. Glob. Optim. 25, 113–140 (2003)CrossRefGoogle Scholar
  48. 48.
    Klepeis J.L., Floudas C.A.: ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)CrossRefGoogle Scholar
  49. 49.
    Klepeis J.L., Floudas C.A., Morikis D., Tsokos C.G., Argyropoulos E., Spruce L., Lambris J.D.: Integrated computational and experimental approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)CrossRefGoogle Scholar
  50. 50.
    Fung H.K., Floudas C.A., Taylor M.S., Zhang L., Morikis D.: Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)CrossRefGoogle Scholar
  51. 51.
    Lin X., Floudas C.A.: Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comput. Chem. Eng. 25, 665–674 (2001)CrossRefGoogle Scholar
  52. 52.
    Janak S.L., Lin X., Floudas C.A.: Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC. 2008

Authors and Affiliations

  • Peter A. DiMaggioJr.
    • 1
  • Scott R. McAllister
    • 1
  • Christodoulos A. Floudas
    • 1
  • Xiao-Jiang Feng
    • 2
  • Joshua D. Rabinowitz
    • 2
  • Herschel A. Rabitz
    • 2
  1. 1.Department of Chemical EngineeringPrinceton UniversityPrincetonUSA
  2. 2.Department of ChemistryPrinceton UniversityPrincetonUSA

Personalised recommendations