Incorporating monotonic domain knowledge in support vector learning for data mining regression problems


A common problem of data-driven data mining methods is that they might lack considering domain knowledge, despite possibly having high accuracy with respect to the data. As such, prior knowledge plays an important role in many data mining applications. Incorporating prior knowledge into data mining techniques is not trivial and remains a partially open issue drawing much attention. In this paper, we propose a new support vector regression (SVR) model that takes into account the prior knowledge of domain experts in the form of inequalities, which reflect the monotonic relationship between the output and some of the attributes of the input. A dual quadratic programming problem corresponding to the SVR model is derived, along with algorithms for solving it and creating constraints, respectively. The experiment results, which were conducted on two artificial and two practical datasets, show that the proposed model, which considers the monotonicity defined by domain experts, performs better than the original SVR. Moreover, the proposed method is also suitable for prior domain knowledge of piecewise monotonicity.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Yu T, Simoff S, Jan T (2010) VQSVM: a case study for incorporating prior domain knowledge into inductive machine learning. Neurocomputing 73(13):2614–2623

    Article  Google Scholar 

  2. 2.

    Sinha AP, Zhao H (2008) Incorporating domain knowledge into data mining classifiers: an application in indirect lending. Decis Support Syst 46(1):287–299

    Article  Google Scholar 

  3. 3.

    Cao L (2010) Domain-driven data mining: challenges and prospects. IEEE Trans Knowl Data Eng 22(6):755–769

    Article  Google Scholar 

  4. 4.

    Eryarsoy E, Koehler GJ, Aytug H (2009) Using domain-specific knowledge in generalization error bounds for support vector machine learning. Decis Support Syst 46(2):481–491

    Article  Google Scholar 

  5. 5.

    Kotlowski W, Slowinski R (2013) On nonparametric ordinal classification with monotonicity constraints. IEEE Trans Knowl Data Eng 25(11):2576–2589

    Article  Google Scholar 

  6. 6.

    Hu Q, Pan W, Song Y, Yu D (2012) Large-margin feature selection for monotonic classification. Knowl Based Syst 31:8–18

    Article  Google Scholar 

  7. 7.

    Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization and beyond. The MIT Press, Cambridge

    Google Scholar 

  8. 8.

    Li ST, Chen CC (2015) A regularized monotonic fuzzy support vector machine model for data mining with prior knowledge. IEEE Trans Fuzzy Syst 23(5):1713–1727

    Article  Google Scholar 

  9. 9.

    Chen CC, Li ST (2014) Credit rating with a monotonicity-constrained support vector machine model. Expert Syst Appl 41(16):7235–7247.

    Article  Google Scholar 

  10. 10.

    Xi X, Shi H, Han L, Wang T, Ding HY, Zhang G, Tang Y, Yin Y (2017) Breast tumor segmentation with prior knowledge learning. Neurocomputing 237:145–157

    Article  Google Scholar 

  11. 11.

    Duivesteijn W, Feelders A (2008) Nearest neighbour classification with monotonicity constraints. In: Daelemans W (ed) The 2008 European conference on machine learning and knowledge discovery in databases, Antwerp, Belgium, 2008. Springer, pp 301–316

  12. 12.

    Towell GG, Shavlik JW (1994) Knowledge-based artificial neural networks. Artif Intell 70(1–2):119–165

    MATH  Article  Google Scholar 

  13. 13.

    Doumpos M, Zopounidis C (2009) Monotonic support vector machines for credit risk rating. New Math Nat Comput 5(3):557–570

    MATH  Article  Google Scholar 

  14. 14.

    Grossi V, Romei A, Turini F (2017) Survey on using constraints in data mining. Data Min Knowl Disc 31(2):424–464.

    MathSciNet  Article  Google Scholar 

  15. 15.

    Petković D, Shamshirband S, Saboohi H, Ang TF, Anuar NB, Pavlović ND (2014) Support vector regression methodology for prediction of input displacement of adaptive compliant robotic gripper. Appl Intell 41(3):887–896

    Article  Google Scholar 

  16. 16.

    Vapnik VN (1995) The Nature of Statistical Learning Theory. Springer-Verlag, New York

    MATH  Book  Google Scholar 

  17. 17.

    Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  18. 18.

    Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. In: Mozer MC, Jordan MI, Petsche T (eds) Proceedings of the 9th international conference on neural information processing systems, Cambridge, MA, USA, 1997. MIT Press, pp 281–287

  19. 19.

    Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11(10):203–224

    Google Scholar 

  20. 20.

    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297.

    MATH  Article  Google Scholar 

  21. 21.

    Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    MATH  Book  Google Scholar 

  22. 22.

    Ma J, Theiler J, Perkins S (2003) Accurate on-line support vector regression. Neural Comput 15(11):2683–2703

    MATH  Article  Google Scholar 

  23. 23.

    Yeh CY, Huang CW, Lee SJ (2011) A multiple-kernel support vector regression approach for stock market price forecasting. Expert Syst Appl 38(3):2177–2186

    Article  Google Scholar 

  24. 24.

    Balasubramanian K, Yu K, Lebanon G (2016) Smooth sparse coding via marginal regression for learning sparse representations. Artif Intell 238:83–95

    MathSciNet  MATH  Article  Google Scholar 

  25. 25.

    Lu CJ, Lee TS, Chiu CC (2009) Financial time series forecasting using independent component analysis and support vector regression. Decis Support Syst 47(2):115–125

    Article  Google Scholar 

  26. 26.

    Lauer F, Bloch G (2008) Incorporating prior knowledge in support vector machines for classification: a review. Neurocomputing 71:1578–1594

    Article  Google Scholar 

  27. 27.

    Evgeniou T, Boussios C, Zacharia G (2005) Generalized robust conjoint estimation. Mark Sci 24(3):415–429.

    Article  Google Scholar 

  28. 28.

    Zhou W, Zhang L, Jiao L (2002) Linear programming support vector machines. Pattern Recogn 35(12):2927–2936

    MATH  Article  Google Scholar 

  29. 29.

    Tikhonov AN, Arsenin VI, John F (1977) Solutions of Ill-posed problems (Trans: Russian Tf), vol 14. V. H. Winston and Sons, Washington, D. C

  30. 30.

    Gutiérrez PA, García S (2016) Current prospects on ordinal and monotonic classification. Prog Artif Intell 5(3):171–179.

    Article  Google Scholar 

  31. 31.

    Rademaker M, De Baets B, De Meyer H (2012) Optimal monotone relabelling of partially non-monotone ordinal data. Optim Methods Softw 27(1):17–31.

    MathSciNet  MATH  Article  Google Scholar 

  32. 32.

    Potharst R, Feelders AJ (2002) Classification trees for problems with monotonicity constraints. ACM SIGKDD Explor Newsl 4(1):1–10

    Article  Google Scholar 

  33. 33.

    Qinghua H, Xunjian C, Lei Z, Zhang D, Maozu G, Yu D (2012) Rank entropy-based decision trees for monotonic classification. IEEE Trans Knowl Data Eng 24(11):2052–2064.

    Article  Google Scholar 

  34. 34.

    Qian Y, Xu H, Liang J, Liu B, Wang J (2015) Fusing monotonic decision trees. IEEE Trans Knowl Data Eng 27(10):2717–2728.

    Article  Google Scholar 

  35. 35.

    Daniels HAM, Velikova MV (2006) Derivation of monotone decision models from noisy data. IEEE Trans Syst Man Cybern Part C (Appl Rev) 36(5):705–710.

    Article  Google Scholar 

  36. 36.

    Pei S, Hu Q, Chen C (2016) Multivariate decision trees with monotonicity constraints. Knowl Based Syst 112:14–25.

    Article  Google Scholar 

  37. 37.

    González S, Herrera F, García S (2015) Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity. New Gener Comput 33(4):367–388

    Article  Google Scholar 

  38. 38.

    Pelckmans K, Espinoza M, De Brabanter J, Suykens JAK, De Moor B (2005) Primal-dual monotone kernel regression. Neural Process Lett 22(2):171–182

    Article  Google Scholar 

  39. 39.

    Shah S, Sardeshmukh A, Ahmed S, Reddy S (2016) Soft monotonic constraint support vector regression. In: Paper presented at the international conference on management of data (COMAD 2016), Pune, India

  40. 40.

    Abu-Mostafa YS (1995) Hints. Neural Comput 7:639–671

    Article  Google Scholar 

  41. 41.

    Abu-Mostafa YS (1994) Learning from hints. J Complex 10:165–178

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Wismer D, Chattergy R (1978) Introduction to Nonlinear Optimization. North Holland, Amsterdam

    MATH  Google Scholar 

  43. 43.

    Bertsekas D, Castañon D, Eckstein J, Zenios S (1995) Chapter 5 Parallel computing in network optimization. Handb Oper Res Manag Sci 7:331–399.

    MATH  Article  Google Scholar 

  44. 44.

    Courant R, Hilbert D (1970) Methods of mathematical physics, vol I. Wiley, New York

    MATH  Google Scholar 

  45. 45.

    Ye Y, Tse E (1989) An extension of Karmarkar’s projective algorithm for convex quadratic programming. Math Program 44(1–3):157–179

    MathSciNet  MATH  Article  Google Scholar 

  46. 46.

    Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade svm. Adv Neural Inf Process Syst 17:521–528

    Google Scholar 

  47. 47.

    Rost B (2009) Basel committee on banking supervision. In: Handbook of transnational economic governance regimes. Brill, pp 319–328

  48. 48.

    Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Accessed 28 Nov 2018

  49. 49.

    Clarke SM, Griebsch JH, Simpson TW (2005) Analysis of support vector regression for approximation of complex engineering analyses. J Mech Des 127(6):1077–1087

    Article  Google Scholar 

  50. 50.

    Nandi S, Toliyat HA, Li X (2005) Condition monitoring and fault diagnosis of electrical motors—a review. IEEE Trans Energy Convers 20(4):719–729

    Article  Google Scholar 

  51. 51.

    Gunn SR (1998) Support vector machines for classification and regression. ISIS Tech Rep 14(1):5–16

    Google Scholar 

  52. 52.

    Chou JS, Pham AD (2015) Smart artificial firefly colony algorithm-based support vector regression for enhanced forecasting in civil engineering. Comput Aided Civ Infrastruct Eng 30(9):715–732

    Article  Google Scholar 

  53. 53.

    Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bureau Stand 49(6):409–436

    MathSciNet  MATH  Article  Google Scholar 

  54. 54.

    Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23(3):219–239

    MATH  Article  Google Scholar 

Download references


This study was supported in part by the Ministry of Science and Technology, Taiwan, under contract NSC 102-2410-H-006-080-MY3, MOST 105-2410-H-006-038-MY3 and MOST 107-2410-H-143-005. The authors also thank Mr. Chi Chou and Mr. Yu-Di, Chen for their help with the experimentation.

Author information



Corresponding author

Correspondence to Sheng-Tun Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chuang, HC., Chen, CC. & Li, ST. Incorporating monotonic domain knowledge in support vector learning for data mining regression problems. Neural Comput & Applic 32, 11791–11805 (2020).

Download citation


  • Data mining
  • Monotonicity constraint
  • Support vector regression
  • Prior domain knowledge