Annals of Operations Research

, Volume 263, Issue 1–2, pp 385–404 | Cite as

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Data Mining and Analytics


Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is \({{\mathscr {N}}}{{\mathscr {P}}}\)-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.


Order-preserving submatrix Integer programming Data mining Biclustering 


  1. Ben-Dor, A., Chor, B., Karp, R., & Yakhini, Z. (2003). Discovering local structure in gene expression data: The order-preserving submatrix problem. Journal of Computational Biology, 10(3–4), 373–384.CrossRefGoogle Scholar
  2. Busygin, S., Prokopyev, O., & Pardalos, P. M. (2008). Biclustering in data mining. Computers & Operations Research, 35(9), 2964–2987.CrossRefGoogle Scholar
  3. Causton, H. C., Ren, B., Koh, S. S., Harbison, C. T., Kanin, E., Jennings, E. G., et al. (2001). Remodeling of yeast genome expression in response to environmental changes. Molecular Biology of the Cell, 12(2), 323–337.CrossRefGoogle Scholar
  4. Chui, C. K., Kao, B., Yip, K. Y., & Lee, S.D. (2008). Mining order-preserving submatrices from data with repeated measurements. In The 8th IEEE international conference on data mining (ICDM) (pp. 133–142). IEEE.Google Scholar
  5. Cooper, S. J., Trinklein, N. D., Anton, E. D., Nguyen, L., & Myers, R. M. (2006). Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Research, 16(1), 1–10.CrossRefGoogle Scholar
  6. Fang, Q., Ng, W., Feng, J., & Li, Y. (2012). Mining bucket order-preserving submatrices in gene expression data. IEEE Transactions on Knowledge and Data Engineering, 24(12), 2218–2231.CrossRefGoogle Scholar
  7. Fang, Q., Ng, W., Feng, J., & Li, Y. (2014). Mining order-preserving submatrices from probabilistic matrices. ACM Transactions on Database Systems, 39(1), 1–43.CrossRefGoogle Scholar
  8. Gao, B. J., Griffith, O. L., Ester, M., & Jones, S. J. (2006). Discovering significant OPSM subspace clusters in massive gene expression data. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 922–928). New York, NY, Philadelphia, PA: ACM.Google Scholar
  9. Gao, B. J., Griffith, O. L., Ester, M., Xiong, H., Zhao, Q., & Jones, S. J. (2012). On the deep order-preserving submatrix problem: A best effort approach. IEEE Transactions on Knowledge and Data Engineering, 24(2), 309–325.CrossRefGoogle Scholar
  10. Griffith, O. L., Gao, B. J., Bilenky, M., Prychyna, Y., Ester, M., & Jones, S. J. (2009). KiWi: A scalable subspace clustering algorithm for gene expression analysis. In Proceedings of the 3rd international conference on bioinformatics and biomedical engineering (iCBBE) (pp. 1–9). IEEE.Google Scholar
  11. Hochbaum, D. S., & Levin, A. (2013). Approximation algorithms for a minimization variant of the order-preserving submatrices and for biclustering problems. ACM Transactions on Algorithms, 9(2), 1–12.CrossRefGoogle Scholar
  12. Humrich, J., Gartner, T., & Garriga, G. C. (2011). A fixed parameter tractable integer program for finding the maximum order preserving submatrix. In The 11th international conference on data mining (ICDM) (pp. 1098–1103). IEEE.Google Scholar
  13. IBM. (2015). IBM ILOG CPLEX 12.5.1 user’s manual. IBM ILOG CPLEX Division, Incline Village, NV.Google Scholar
  14. King, J. Y., Ferrara, R., Tabibiazar, R., Spin, J. M., Chen, M. M., Kuchinsky, A., et al. (2005). Pathway analysis of coronary atherosclerosis. Physiological Genomics, 23(1), 103–118.CrossRefGoogle Scholar
  15. Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.CrossRefGoogle Scholar
  16. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.CrossRefGoogle Scholar
  17. Trapp, A. C., & Prokopyev, O. A. (2010). Solving the order-preserving submatrix problem via integer programming. INFORMS Journal on Computing, 22(3), 387–400.CrossRefGoogle Scholar
  18. Yip, K. Y., Kao, B., Zhu, X., Chui, C. K., Lee, S. D., & Cheung, D. W. (2013). Mining order-preserving submatrices from data with repeated measurements. IEEE Transactions on Knowledge and Data Engineering, 25(7), 1587–1600.CrossRefGoogle Scholar
  19. Zhang, M., Wang, & W., Liu, J. (2008). Mining approximate order preserving clusters in the presence of noise. In IEEE 24th international conference on data engineering (ICDE) (pp. 160–168). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Foisie School of BusinessWorcester Polytechnic InstituteWorcesterUSA
  2. 2.Department of Computer ScienceWorcester Polytechnic InstituteWorcesterUSA
  3. 3.Department of Mathematics and StatisticsUniversity of MassachusettsAmherstUSA

Personalised recommendations