Journal of Global Optimization

, 45:111

Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem

  • Scott R. McAllister
  • Peter A. DiMaggioJr.
  • Christodoulos A. Floudas
Article

Abstract

In this article we present a computational study for solving the distance-dependent rearrangement clustering problem using mixed-integer linear programming (MILP). To address sparse data sets, we present an objective function for evaluating the pair-wise interactions between two elements as a function of the distance between them in the final ordering. The physical permutations of the rows and columns of the data matrix can be modeled using mixed-integer linear programming and we present three models based on (1) the relative ordering of elements, (2) the assignment of elements to a final position, and (3) the assignment of a distance between a pair of elements. These models can be augmented with the use of cutting planes and heuristic methods to increase computational efficiency. The performance of the models is compared for three distinct re-ordering problems corresponding to glass transition temperature data for polymers and two drug inhibition data matrices. The results of the comparative study suggest that the assignment model is the most effective for identifying the optimal re-ordering of rows and columns of sparse data matrices.

Keywords

Clustering Mixed-integer linear programming Sparse data sets 

References

  1. 1.
    Anderberg M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)Google Scholar
  2. 2.
    Jain A.K., Flynn P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds) Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)Google Scholar
  3. 3.
    Salton G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)CrossRefGoogle Scholar
  4. 4.
    Eisen M.B., Spellman P.T., Brown P.O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. 95, 14863–14868 (1998)CrossRefGoogle Scholar
  5. 5.
    Zhang Y., Skolnick J.: SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)CrossRefGoogle Scholar
  6. 6.
    Mönnigmann M., Floudas C.A.: Protein loop structure prediction with flexible stem geometries. Protein Struct. Funct. Bioinform. 61, 748–762 (2005)CrossRefGoogle Scholar
  7. 7.
    Hartigan J.A., Wong M.A.: Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRefGoogle Scholar
  8. 8.
    Edwards A.W.F., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)CrossRefGoogle Scholar
  9. 9.
    Wolfe J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)CrossRefGoogle Scholar
  10. 10.
    Jain A.K., Mao J.: Artificial neural networks: a tutorial. IEEE Comput. 29, 31–44 (1996)Google Scholar
  11. 11.
    Klein R.W., Dubes R.C.: Experiments in projection and clustering by simulated annealing. Pattern Recognit. 22, 213–220 (1989)CrossRefGoogle Scholar
  12. 12.
    Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22. Dallas, Texas (1979)Google Scholar
  13. 13.
    Bhuyan, J.N., Raghavan, V.V., Venkatesh, K.E.: Genetic algorithm for clustering with an ordered representation. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 408–415. San Mateo, California (1991)Google Scholar
  14. 14.
    Tan M.P., Broach J.R., Floudas C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)CrossRefGoogle Scholar
  15. 15.
    Tan M.P., Broach J.R., Floudas C.A.: Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning. J. Bioin. Comp. Bio. 5(4), 895–913 (2007)CrossRefGoogle Scholar
  16. 16.
    Tan M.P., Smith E., Broach J.R., Floudas C.A.: Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Bioinform. 9, 268–283 (2008)CrossRefGoogle Scholar
  17. 17.
    Jain A.K., Murty M.N., Flynn P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  18. 18.
    McCormick W.T., Schweitzer P.J., White T.W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20, 993–1009 (1972)CrossRefGoogle Scholar
  19. 19.
    Lenstra J.K.: Clustering a data array and the traveling salesman problem. Oper. Res. 22, 413–414 (1974)CrossRefGoogle Scholar
  20. 20.
    Lenstra J.K., Rinnooy Kan A.H.G.: Some simple applications of the traveling salesman problem. Oper. Res. Q. 26, 717–733 (1975)CrossRefGoogle Scholar
  21. 21.
    Alpert C.J., Kahng A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)CrossRefGoogle Scholar
  22. 22.
    Climer S., Zhang W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. 7, 919–943 (2006)Google Scholar
  23. 23.
    DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: A network flow model for biclustering via optimal re-ordering of data matrices. J. Glob. Optim. (2009, in press)Google Scholar
  24. 24.
    DiMaggio P.A., McAllister S.R., Floudas C.A., Feng X.J., Rabinowitz J.D., Rabitz H.A.: Biclustering via optimal re-ordering of data matrices in systems biology: rigourous methods and comparative studies. BMC Bioinform. 9, 458 (2008)CrossRefGoogle Scholar
  25. 25.
    Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)CrossRefGoogle Scholar
  26. 26.
    Koopmans T.C., Beckmann M.J.: Assignment problems and the location of economic activities. Econometrica 25, 53–76 (1957)CrossRefGoogle Scholar
  27. 27.
    Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey. In: Pardalos, P.M., Wolkowicz, H. (eds.) Quadratic Assignment and Related Problems, vol. 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. AMS, Rhode Island (1994)Google Scholar
  28. 28.
    Anstreicher K., Brixius N., Goux J.P., Linderoth J.: Solving large quadratic assignment problems on computational grids. Math. Progr. 91(3), 563–588 (2002)CrossRefGoogle Scholar
  29. 29.
    Loiola E.M., de de Abreu N.M.M., Boaventura-Netto P.O., Hahn P., Querido T.: A survey for the quadratic assignment problem. Eur. J. Oper. Res. 176, 657–690 (2007)CrossRefGoogle Scholar
  30. 30.
    Adams W.P., Guignard M., Hahn P.M., Hightower W.L.: A level-2 reformulation-linearization technique bound for the quadratic assignment problem. Eur. J. Oper. Res. 180, 983–996 (2007)CrossRefGoogle Scholar
  31. 31.
    Singh S.P., Sharma R.R.K.: A review of different approaches to the facility layout problems. Int. J. Adv. Manuf. Technol. 30, 425–433 (2006)CrossRefGoogle Scholar
  32. 32.
    Reynolds C.H.: Designing diversed and focused combinatorial libraries of synthetic polymers. J. Comb. Chem. 1(4), 297–306 (1999)CrossRefGoogle Scholar
  33. 33.
    Floudas C.A., Grossmann I.E.: Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comp. Chem. Eng. 11(4), 319–336 (1987)CrossRefGoogle Scholar
  34. 34.
    Ciric A.R., Floudas C.A.: A retrofit approach for heat-exchanger networks. Comp. Chem. Eng. 13(6), 703–715 (1989)CrossRefGoogle Scholar
  35. 35.
    Floudas C.A., Anastasiadis S.H.: Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)CrossRefGoogle Scholar
  36. 36.
    Kokossis A.C., Floudas C.A.: Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)CrossRefGoogle Scholar
  37. 37.
    Aggarwal A., Floudas C.A.: Synthesis of general separation sequences—nonsharp separations. Comp. Chem. Eng. 14(6), 631–653 (1990)CrossRefGoogle Scholar
  38. 38.
    CPLEX: ILOG CPLEX 9.1 User’s Manual (2005)Google Scholar
  39. 39.
    McAllister S.R., Feng X.-J., DiMaggio P.A. Jr., Floudas C.A., Rabinowitz J.D., Rabitz H.: Descriptor-free molecular discovery in large libraries by adaptive substituent reordering. Bioorg. Med. Chem. Lett. 18, 5967–5970 (2008)CrossRefGoogle Scholar
  40. 40.
    DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets (submitted for publication)Google Scholar
  41. 41.
    Shenvi N., Geremia J.M., Rabitz H.: Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. 107, 2066–2074 (2003)Google Scholar
  42. 42.
    Burkard R.E., Karisch S.E., Rendl F.: QAPLIB—a quadratic assignment problem libary. J. Glob. Optim. 10(4), 391–403 (1997)CrossRefGoogle Scholar
  43. 43.
    Gilmore P.C.: Optimal and suboptimal algorithms for the quadratic assignment problem. SIAM J. Appl. Math. 10, 305–313 (1962)CrossRefGoogle Scholar
  44. 44.
    Androulakis I.P., Maranas C.D., Floudas C.A.: Prediction of oligopeptide conformations via deterministic global optimization. J. Glob. Optim. 11, 1–34 (1997)CrossRefGoogle Scholar
  45. 45.
    Klepeis J.L., Floudas C.A.: Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)CrossRefGoogle Scholar
  46. 46.
    Klepeis J.L., Floudas C.A., Morikis D., Lambris J.D.: Predicting peptide structures using NMR data and deterministic global optimization. J. Comp. Chem. 20(13), 1354–1370 (1999)CrossRefGoogle Scholar
  47. 47.
    Klepeis J.L., Floudas C.A.: Ab initio tertiary structure prediction of proteins. J. Glob. Optim. 25, 113–140 (2003)CrossRefGoogle Scholar
  48. 48.
    Klepeis J.L., Floudas C.A.: ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)CrossRefGoogle Scholar
  49. 49.
    Klepeis J.L., Floudas C.A., Morikis D., Tsokos C.G., Argyropoulos E., Spruce L., Lambris J.D.: Integrated computational and experimenal approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)CrossRefGoogle Scholar
  50. 50.
    Fung H.K., Floudas C.A., Taylor M.S., Zhang L., Morikis D.: Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)CrossRefGoogle Scholar
  51. 51.
    Lin X., Floudas C.A.: Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comp. Chem. Eng. 25, 665–674 (2001)CrossRefGoogle Scholar
  52. 52.
    Janak S.L., Lin X., Floudas C.A.: Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: Resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC. 2009

Authors and Affiliations

  • Scott R. McAllister
    • 1
  • Peter A. DiMaggioJr.
    • 1
  • Christodoulos A. Floudas
    • 1
  1. 1.Department of Chemical EngineeringPrinceton UniversityPrincetonUSA

Personalised recommendations