Combinatorial Optimization in Data Mining

Saedi, Samira; Kundakcioglu, O. Erhun

doi:10.1007/978-1-4419-7997-1_7

Samira Saedi⁴ &
O. Erhun Kundakcioglu⁵

7449 Accesses
1 Citations

Abstract

This chapter presents data mining techniques that are formulated as combinatorial optimization problems together with their applications. There are a number of cases where fundamental data mining tool is not combinatorial in nature, yet widely used special-purpose combinatorial extensions exist. For the sake of completeness, these fundamental tools are also discussed in detail before the extensions with underlying combinatorial optimization problems. A number of computationally challenging data mining algorithms that have non-convex formulations are also explored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 3,400.00; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
^∗ This work is supported by University of Houston New Faculty Research Grant.
2.
An exact equality for balancing constraint is likely to lead to infeasible solutions depending on the number of instances and ratios. Therefore, a subtle adjustment is usually necessary to ensure feasibility.

Recommended Reading

J. Abello, M.G.C. Resende, S. Sudarsky, Massive quasi-clique detection, in LATIN 2002: Theoretical Informatics (Springer, Berlin/New York, 2002), pp. 598–612
Google Scholar
S. Alexe, E. Blackstone, P. Hammer, H. Ishwaran, M. Lauer, C. Snader, Coronary risk prediction by logical analysis of data. Ann. Oper. Res. 119, 15–42 (2003)
MATH Google Scholar
D. Aloise, A. Deshpande, P. Hansen, P. Popat, NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–248 (2009)
Google Scholar
D. Arthur, S. Vassilvitskii, How slow is the k-means method? in Proceedings of the 22nd Annual Symposium on Computational Geometry (ACM, New York, 2006), pp. 144–153
Google Scholar
B. Balasundaram, S. Butenko, I.V. Hicks, Clique relaxations in social network analysis: the maximum k-plex problem. Oper. Res. 59, 133–142 (2011)
MathSciNet MATH Google Scholar
G.H. Ball, D.J. Hall, ISODATA, a novel method of data analysis and pattern classification. Technical report, Stanford Research Institute, Menlo Park, CA, 1965
Google Scholar
A. Banerjee, S. Merugu, I.S. Dhillon, J. Ghosh, Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
A. Baraldi, P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition – part II. IEEE Trans. Syst. Man Cybern. B 29(6), 786–801 (1999)
Google Scholar
M. Belkin, I. Matveeva, P. Niyogi, Regularization and semi-supervised learning on large graphs. Learn. Theory 3120, 624–638 (2004)
MathSciNet Google Scholar
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, Z. Yakhini, Tissue classification with gene expression profiles, in Proceedings of the 4th Annual International Conference on Computational Biology (RECOMB), Tokyo, 2000, pp. 54–64
Google Scholar
A. Ben-Dor, N. Friedman, Z. Yakhini, Class discovery in gene expression data, in Proceedings of the 5th Annual International Conference on Computational Biology (RECOMB), New York, NY, USA (ACM, 2001), pp. 31–38
Google Scholar
A. Ben-Dor, B. Chor, R. Karp, Z. Yakhini, Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10(3–4), 373–384 (2003)
Google Scholar
Y. Bengio, O. Delalleau, N. Le Roux, Label propagation and quadratic criterion, in Semi Supervised Learning (MIT, Cambridge, 2006)
Google Scholar
K.P. Bennett, A. Demiriz, Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 11, 368–374 (1999)
Google Scholar
C. Bergeron, F. Cheriet, J. Ronsky, R. Zernicke, H. Labelle, Prediction of anterior scoliotic spinal curve from trunk surface using support vector regression. Eng. Appl. Artif. Intell. 18(8), 973–983 (2005)
Google Scholar
D. Bertsimas, R. Shioda, Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
MathSciNet MATH Google Scholar
J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Kluwer Academic, Norwell, 1981)
MATH Google Scholar
T.D. Bie, N. Cristianini, Semi-supervised learning using semi-definite programming, in Semi-Supervised Learning (MIT, Cambridge, 2006), pp. 119–135
Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer, New York, 2006)
Google Scholar
A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
MathSciNet MATH Google Scholar
A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in Proceedings of the 11th Annual Conference on Computational Learning Theory (ACM, New York, 1998), pp. 92–100
Google Scholar
V. Boginski, Network-based data mining: operations research techniques and applications, in Encyclopedia of Operations Research and Management Science (Wiley, Hoboken, 2010) pp. 3498–3508
Google Scholar
P.S. Bradley, O.L. Mangasarian, Feature selection via concave minimization and support vector machines, in Proceedings of the Fifteenth International Conference on Machine Learning (ICML), Madison, 1998, pp. 82–90
Google Scholar
P.S. Bradley, U.M. Fayyad, O.L. Mangasarian, Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11, 217–238 (1999)
MathSciNet MATH Google Scholar
J.P. Brooks, Support vector machines with the ramp loss and the hard margin loss. Oper. Res. 59(2), 467–479 (2011)
MathSciNet MATH Google Scholar
M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sugne, T. Furey, M. Ares, D. Haussler, Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)
Google Scholar
K. Bryan, Biclustering of expression data using simulated annealing, in Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS) Washington, DC, USA, 2005, pp. 383–388
Google Scholar
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning, Bonn, 2005, pp. 89–96
Google Scholar
S. Busygin, O.A. Prokopyev, P.M. Pardalos, Feature selection for consistent biclustering. J. Comb. Optim. 10, 7–21 (2005)
MathSciNet MATH Google Scholar
S. Busygin, N. Boyko, P.M. Pardalos, M. Bewernitz, G. Ghacibeh, Biclustering EEG data from epileptic patients treated with vagus nerve stimulation, in Data Mining, Systems Analysis and Optimization in Biomedicine, vol. 953, ed. by O. Seref, O.E. Kundakcioglu, P.M. Pardalos (American Institute of Physics, Melville, 2007), pp. 220–231
Google Scholar
S. Busygin, O. Prokopyev, P.M. Pardalos, Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)
MathSciNet MATH Google Scholar
D. Casasent, X.W. Chen, Waveband selection for hyperspectral data: optimal feature selection, in Proceedings of SPIE, vol. 5106, Orlando, FL, 2003, pp. 259–270
Google Scholar
W. Chaovalitwongse, Novel quadratic programming approach for time series clustering with biomedical application. J. Comb. Optim. 15, 225–241 (2008)
MathSciNet MATH Google Scholar
O. Chapelle, Training a support vector machine in the primal. Neural Comput. 19, 1155–1178 (2007)
MathSciNet MATH Google Scholar
O. Chapelle, A. Zien, Semi-supervised classification by low density separation, in Proceeding of International Conference on Artificial Intelligence and Statistics (AISTAT), Barbados, 2005, pp. 57–64
Google Scholar
O. Chapelle, M. Chi, A. Zien, A continuation method for semi-supervised SVMs, in Proceedings of the 23rd International Conference on Machine Learning (ICML), New York, NY, USA (ACM, 2006), pp. 185–192
Google Scholar
O. Chapelle, V. Sindhwani, S.S. Keerthi, Branch and bound for semi-supervised support vector machines. Adv. Neural Inform. Process. Syst. 19, 217–224 (2007)
Google Scholar
O. Chapelle, V. Sindhwani, S.S. Keerthi, Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)
MATH Google Scholar
X. Chen, An improved branch and bound algorithm for feature selection. Pattern Recognit. Lett. 24(12), 1925–1933 (2003)
Google Scholar
Y. Cheng, G.M. Church, Biclustering of expression data, in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (AAAI, Menlo Park, 2000) pp. 93–103
Google Scholar
H. Cheng, Z. Liu, J. Yang, Sparsity induced similarity measure for label propagation, in Proceedings of 12nd IEEE International Conference on Computer Vision, Kyoto, Japan, 2010, pp. 317–324
Google Scholar
K.Y. Choy, C.W. Chan, Modeling of river discharges and rainfall using radial basis function networks based on support vector regression. Int. J. Syst. Sci. 34(14–15), 763–773 (2003)
MATH Google Scholar
C. Cifarelli, G. Patrizi, Solving large protein folding problem by a linear complementarity algorithm with 0–1 variables. Optim. Methods Softw. 22(1), 25–49 (2007)
MathSciNet MATH Google Scholar
R. Collobert, F. Sinz, J. Weston, L. Bottou, T. Joachims, Large scale transductive SVMs. J. Mach. Learn. Res. 7, 2006 (2006)
Google Scholar
N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, 2000)
Google Scholar
M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Google Scholar
O. Delalleau, Y. Bengio, N. Le Roux, Efficient non-parametric function induction in semi-supervised learning, in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTAT 2005), Barbados, 2005
Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
I.S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), New York, NY, USA (ACM, 2001), pp. 269–274
Google Scholar
I.S. Dhillon, S. Mallela, D.S. Modha, Information-theoretic co-clustering, in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), New York, NY, USA (ACM, 2003), pp. 89–98
Google Scholar
J. Doak, An evaluation of feature selection methods and their application to computer security. Technical report, University of California, 1992
Google Scholar
C. Dwork, R. Kumar, M. Naor, D. Sivakumar, Rank aggregation methods for the web, in Proceedings of the 10th International Conference on World Wide Web, New York, NY, USA (ACM, 2001), pp. 613–622
Google Scholar
S. Eschrich, J. Ke, L.O. Hall, D.B. Goldgof, Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Syst. 11(2), 262–270 (2003)
Google Scholar
E. Forgy, Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. Biometrics 21(3), 768 (1965)
Google Scholar
A. Frank, D. Geiger, Z. Yakhini, A distance-based branch and bound feature selection algorithm, in Proceedings of the Nineteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-03), Acapulco, 2003, pp. 241–248
Google Scholar
Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet Google Scholar
B.J. Frey, D. Dueck, Clustering by passing messages between data points. Sci. 315(5814), 972–976 (2007)
MathSciNet MATH Google Scholar
H.P. Friedman, J. Rubin, On some invariant criteria for grouping data. J. Am. Stat. Assoc. 62(320), 1159–1178 (1967)
MathSciNet Google Scholar
G. Fung, O.L. Mangasarian, Semi-supervised support vector machines for unlabeled data classification. Optim. Methods Softw. 15, 29–44 (2001)
MATH Google Scholar
G.N. Garcia, T. Ebrahimi, J.M. Vesin, Joint time-frequency-space classification of EEG in a brain-computer interface application. J. Appl. Signal Process 7, 713–729 (2003)
Google Scholar
M.R. Garey, D.S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness (W. H. Freeman, New York, 1979)
MATH Google Scholar
Z. Ghahramani, Unsupervised learning, in Advanced Lectures on Machine Learning (Springer, Berlin/New York, 2003), pp. 72–112
Google Scholar
I.A. Gheyas, L.S. Smith, Feature subset selection in large dimensionality domains. Pattern Recognit. 43(1), 5–13 (2010)
MATH Google Scholar
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Google Scholar
Y. Grandvalet, S. Canu, Adaptive scaling for feature selection in SVMs, in NIPS, Vancouver, 2002, pp. 553–560
Google Scholar
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
MATH Google Scholar
Y. Hamamoto, S. Uchimura, Y. Matsuura, T. Kanaoka, S. Tomita, Evaluation of the branch and bound algorithm for feature selection. Pattern Recognit. Lett. 11(7), 453–456 (1990)
MATH Google Scholar
J.A. Hartigan, Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)
Google Scholar
W.C. Hong, P.F. Pai, Potential assessment of the support vector regression technique in rainfall forecasting. Water Res. Manage. 21(2), 495–513 (2007)
Google Scholar
C.W. Hsu, C.C. Chang, C.J. Lin, A practical guide to support vector classification (2004), http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Z. Huang, H. Chen, C.J. Hsu, W.H. Chenb, S. Wuc, Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis. Support Syst. 37, 543–558 (2004)
Google Scholar
K. Hyunsoo, Z.X. Jeff, M.C. Herbert, P. Haesun, A three-stage framework for gene expression data analysis by L1-norm support vector regression. Int. J. Bioinformatics Res. Appl. 1(1), 51–62 (2005)
Google Scholar
A.K. Jain, Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Google Scholar
A.K. Jain, R.C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Upper Saddle River, 1988)
MATH Google Scholar
X. Jiang, L.H. Lim, Y. Yao, Y. Ye, Statistical ranking and combinatorial hodge theory. Mathematical Programming 127, 1–42 (2010)
MathSciNet Google Scholar
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of the European Conference on Machine Learning, Berlin, ed. by C. Nédellec, C. Rouveirol (Springer, 1998), pp. 137–142
Google Scholar
T. Joachims, Making large–scale SVM learning practical, in Advances in Kernel Methods – Support Vector Learning, Cambridge, MA, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT, 1999), pp. 169–184
Google Scholar
T. Joachims, Transductive learning via spectral graph partitioning, in Proceedings of 20th International Conference on Machine Learning (ICML), Washington, DC, USA, vol. 20, 2003, pp. 290–297
Google Scholar
G.H. John, R. Kohavi, K. Pfleger, Irrelevant features and the subset selection problem, in Proceedings of the Eleventh International Conference on Machine Learning, New Brunswick, vol. 129, 1994, pp. 121–129
Google Scholar
H. Kashima, J. Hu, B. Ray, M. Singh, K-means clustering of proportional data using L1 distance, in Proceedings of 19th International Conference on Pattern Recognition (ICPR), Tampa, FL, 2009, pp. 1–4
Google Scholar
F. Klawonn, A. Keller, Fuzzy clustering based on modified distance measures, in IDA ’99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis (Springer, Berlin, 1999), pp. 291–302
Google Scholar
Y. Kluger, R. Basri, J.T. Chang, M. Gerstein, Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)
Google Scholar
R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
MATH Google Scholar
M. Kudo, J. Sklansky, Comparison of algorithms that select features for pattern classifiers. Pattern Recognit. 33(1), 25–41 (2000)
Google Scholar
O.E. Kundakcioglu, P.M. Pardalos, The complexity of feature selection for consistent biclustering, in Clustering Challenges in Biological Networks (World Scientific, Hackensack, 2009), pp. 257–266
Google Scholar
O.E. Kundakcioglu, T. Ünlüyurt, Bottom-up construction of minimum-cost AND/OR trees for sequential fault diagnosis. IEEE Trans. Syst. Man Cybern. A 37(5), 621–629 (2007)
Google Scholar
O.E. Kundakcioglu, O. Seref, P.M. Pardalos, Multiple instance learning via margin maximization. Appl. Numer. Math. 60(4), 358–369 (2010)
MathSciNet MATH Google Scholar
T.N. Lal, M. Schroeder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, B. Schölkopf, Support vector channel selection in BCI. IEEE Trans. Biomed. Eng. 51(6), 1003–1010 (2004)
Google Scholar
P. Langley, Selection of relevant features in machine learning, in Proceedings of the AAAI Fall Symposium on Relevance (AAAI, 1994), New Orleans, LA, pp. 140–144
Google Scholar
F. Lauer, G. Bloch, Incorporating prior knowledge in support vector regression. Mach. Learn. 70, 89–118 (2008)
Google Scholar
S. Lee, A. Verri (eds.), Pattern Recognition with Support Vector Machines, Niagara Falls, Canada (Springer, New York/Berlin, 2002)
MATH Google Scholar
Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Google Scholar
H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Google Scholar
S. Lloyd, Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982). Original paper was published as a technical note in 1957, Bell Labs
MathSciNet MATH Google Scholar
J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in Fifth Symposium on Math, Statistics and Probability (University of California Press, Berkeley, 1967), pp. 281–297
Google Scholar
S. Madeira, A. Oliveira, Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1, 24–45 (2004)
Google Scholar
P.K. Mallapragada, R. Jin, A.K. Jain, Y. Liu, SemiBoost: boosting for semi-supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2000–2014 (2009)
Google Scholar
J. Mao, A.K. Jain, A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7(1), 16–29 (2002)
Google Scholar
G.J. McLachlan, T. Krishnan, The EM algorithm and extensions Wiley-Interscience, Hoboken, Newjersy (LibreDigital, 2008)
Google Scholar
Merriam-Webster, Dictionary and Thesaurus – Merriam-Webster Online (2011), http://www.merriam-webster.com/dictionary/data_mining
B.G. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Publishers, Dordrecht, Netherland, (Springer, 1996)
MATH Google Scholar
A. Nahapetyan, S. Busygin, P.M. Pardalos, An improved heuristic for consistent biclustering problems, in Mathematical Modelling of Biosystems (Springer, Berlin, 2008), pp. 185–198
Google Scholar
S. Nakariyakul, D.P. Casasent, Adaptive branch and bound algorithm for selecting optimal features. Pattern Recognit. Lett. 28(12), 1415–1427 (2007)
Google Scholar
P.M. Narendra, K. Fukunaga, A branch and bound algorithm for feature subset selection. IEEE Transact. Comput. 100(9), 917–922 (1977)
Google Scholar
W.S. Noble, Support vector machine applications in computational biology, in Kernel Methods in Computational Biology (MIT, Cambridge MA, 2004), New York, NY, pp. 71–92
Google Scholar
R.F.E. Osuna, F. Girosi, An improved training algorithm for support vector machines, in IEEE Workshop on Neural Networks for Signal Processing, New York, NY, 1997, pp. 276–285
Google Scholar
P.F. Pai, W.C. Hong, A recurrent support vector regression model in rainfall forecasting. Hydrol. Process. 21(6), 819–827 (2007)
Google Scholar
P.M. Pardalos, E. Romeijn (eds.), Handbook of Optimization in Medicine (Springer, Newyork/London, 2009)
MATH Google Scholar
J. Platt, Fast training of SVMs using sequential minimal optimization, in Advances in Kernel Methods: Support Vector Learning (MIT, Cambridge MA, 1999), pp. 185–208
Google Scholar
M.H. Poursaeidi and O.E. Kundakcioglu, Robust support vector machines for multiple instanceclassification, Annals of Operations Research, published online. doi:10.1007/s10479-012- 1241-z M.H. Poursaeidi, O.E. Kundakcioglu, Robust support vector machines for multiple instance classification (2011, under revision)
Google Scholar
G. Pyrgiotakis, O.E. Kundakcioglu, K. Finton, P.M. Pardalos, K. Powers, B.M. Moudgil, Cell death discrimination with Raman spectroscopy and support vector machines. Ann. Biomed. Eng. 37(7), 1464–1473 (2009)
Google Scholar
G. Pyrgiotakis, O.E. Kundakcioglu, P.M. Pardalos, B.M. Moudgil, Raman spectroscopy and support vector machines for quick toxicological evaluation of titania nanoparticles. J. Raman Spectrosc. (2011, accepted). doi:10.1002/jrs.2839
Google Scholar
M. Ris, J. Barrera, D.C. Martins Jr., U-curve: a branch-and-bound optimization algorithm for u-shaped cost functions on boolean lattices applied to the feature selection problem. Pattern Recognit. 43(3), 557–568 (2010)
MATH Google Scholar
Y. Saeys, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507 (2007)
Google Scholar
N.A. Sakhanenko, G.F. Luger, Shock physics data reconstruction using support vector regression. Int. J. Mod. Phys. 17(9), 1313–1325 (2006)
MATH Google Scholar
B. Schölkopf, A.J. Smola, Learning with Kernels (MIT, Cambridge MA, 2002)
Google Scholar
O. Seref, O.E. Kundakcioglu, P.M. Pardalos, Selective linear and nonlinear classification, in CRM Proceedings and Lecture Notes, vol. 45, ed. by P.M. Pardalos, P. Hansen (American Mathematical Society, Providence, 2008), pp. 211–234
Google Scholar
O. Seref, O.E. Kundakcioglu, O.A. Prokopyev, P.M. Pardalos, Selective support vector machines. J. Comb. Optim. 17(1), 3–20 (2009)
MathSciNet MATH Google Scholar
S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. B 127, 3–30 (2011)
MathSciNet MATH Google Scholar
J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2004)
Google Scholar
Q. Sheng, Y. Moreau, B. DeMoor, Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)
Google Scholar
H.D. Sherali, J. Desai, A global optimization RLT-based approach for solving the fuzzy clustering problem. J. Glob. Optim. 33(4), 597–615 (2005)
MathSciNet MATH Google Scholar
Y. Shi, Y. Tian, G. Kou, Y. Peng, J. Li, Optimization Based Data Mining: Theory and Applications (Springer, New York, 2011)
Google Scholar
O. Shirokikh, V. Stozhkov, V. Boginski, Combinatorial optimization techniques for network-based data mining, in Handbook of Combinatorial Optimization, 2nd Edition, (Springer, 2013)
Google Scholar
W. Siedlecki, J. Sklansky, On automatic feature selection. Intern. J. Pattern Recognit. Artif. Intell. 2(2), 197–220 (1988)
Google Scholar
V. Sindhwani, S.S. Keerthi, Large scale semi-supervised linear SVMs, in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2006), pp. 477–484
Google Scholar
P. Somol, P. Pudil, J. Kittler, Fast branch & bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 900–912 (2004)
Google Scholar
M. Song, C.M. Breneman, J. Bi, N. Sukumar, K.P. Bennett, S. Cramer, N. Tugcu, Prediction of protein retention times in anion-exchange chromatography systems using support vector regression. J. Chem. Inf. Comput. Sci. 42(6), 1347–1357 (2002)
Google Scholar
I. Steinwart, Support vector machines are universally consistent. J. Complex. 18, 768–791 (2002)
MathSciNet MATH Google Scholar
Y.F. Sun, Y.C. Liang, C.G. Wu, X.W. Yang, H.P. Lee, W.Z. Lin, Estimate of error bounds in the improved support vector regression. Prog. Nat. Sci. 14(4), 362–364 (2004)
MathSciNet MATH Google Scholar
M. Szummer, T. Jaakkola, Partially labeled classification with Markov random walks. Adv. Neural Inf. Process. Syst. 2, 945–952 (2002)
Google Scholar
J. Thorsten, Transductive inference for text classification using support vector machines, in Proceedings of 16th International Conference on Machine Learning (Morgan Kaufmann, San Francisco, 1999), pp. 200–209
Google Scholar
T.B. Trafalis, H. Ince, Support vector machine for regression and applications to financial forecasting, in Proceedings of International Joint Conference on Neural Networks (IJCNN), Como, 2002
Google Scholar
A.C. Trapp, O.A. Prokopyev, Solving the order-preserving submatrix problem via integer programming. INFORMS J. Comput. 22(3), 387–400 (2010)
MATH Google Scholar
V. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995)
MATH Google Scholar
V. Vapnik, A. Chervonenkis, Theory of Pattern Recognition (Naula/Moscow, Russia, 1974)
MATH Google Scholar
V. Vapnik, A. Sterin, On structural risk minimization or overall risk in a problem of pattern recognition, in Automation and Remote Control, vol. 10, 1977, pp. 1495–1503
Google Scholar
J. Wang, On transductive support vector machines, in Prediction and Discovery (American Mathematical Society, Providence, Snowbird, Utah, 2007)
Google Scholar
Z. Wang, J. Yang, G. Li, An improved branch & bound algorithm in feature selection, in Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, 2003, pp. 549–556
Google Scholar
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik, Feature selection for SVMs, in Proceeding of NIPS, Denver, 2000, pp. 668–674
Google Scholar
Z.L. Wu, C.H. Li, J.K.Y. Ng, K.R.P.H. Leung, Location estimation via support vector regression. IEEE Trans. Mob. Comput. 6(3), 311–321 (2007)
Google Scholar
X.S. Xie, W.T. Liu, B.Y. Tang, Space based estimation of moisture transport in marine atmosphere using support vector regression. Remote Sens. Environ. 112(4), 1846–1855 (2008)
Google Scholar
E.P. Xing, R.M. Karp, CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics Discov. Note 17, 306–315 (2001)
Google Scholar
K. Yamamoto, F. Asano, T. Yamada, N. Kitawaki, Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89–A(8), 2158–2165 (2006)
Google Scholar
S. Yang, P. Shi, Bidirectional automated branch and bound algorithm for feature selection. J. Shanghai Univ. (English Edition) 9(3), 244–248 (2005)
Google Scholar
B. Yu, B. Yuan, A more efficient branch and bound algorithm for feature selection. Pattern Recognit. 26(6), 883–889 (1993)
MathSciNet Google Scholar
A.L. Yuille, A. Rangarajan, The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)
MATH Google Scholar
X. Zhu, Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, 2005, CMU-LTI-05-192
Google Scholar
X. Zhu, Semi-supervised learning literature survey (2006), Available online at http://pages.cs.wisc.edu/~jerryzhu
X. Zhu, Z. Ghahramani, Learning from labeled and unlabeled data with label propagation. Technical report, Citeseer, 2002
Google Scholar
J. Zhu, S. Rosset, T. Hastie, R. Tibshirani, 1-norm support vector machines, in Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2003
Google Scholar
X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in Proceedings of 21st International Conference on Machine Learning (ICML), Washington, DC, USA, vol. 20, 2003, p. 912
Google Scholar
H. Zou, M. Yuan, The f _∞-norm support vector machine. Stat. Sin. 18, 379–398 (2008)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, University of Houston, E209 Engineering Bldg. 2, 77204, Houston, TX, USA
Samira Saedi
Department of Industrial Engineering, University of Houston, E209 Engineering Bldg. 2, 77204, Houston, TX, USA
O. Erhun Kundakcioglu

Authors

Samira Saedi
View author publications
You can also search for this author in PubMed Google Scholar
O. Erhun Kundakcioglu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Samira Saedi or O. Erhun Kundakcioglu .

Editor information

Editors and Affiliations

Department of Industrial and Systems Eng, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos
Department of Computer Science, University of Texas, Dallas, Richardson, Texas, USA
Ding-Zhu Du
Dept. Comp. Sci. & Engineering, University of California, San Diego, La Jolla, California, USA
Ronald L. Graham

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Saedi, S., Kundakcioglu, O.E. (2013). Combinatorial Optimization in Data Mining. In: Pardalos, P., Du, DZ., Graham, R. (eds) Handbook of Combinatorial Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7997-1_7

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7997-1_7
Published: 26 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7996-4
Online ISBN: 978-1-4419-7997-1
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics