Skip to main content
Log in

Naive Bayes for value difference metric

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instanceweighting technique to improveVDM. An instance weighted value difference metric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for training instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the difference between the estimated conditional probability ^P(c|x) by NB and the true conditional probability P(c|x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complexity of the process of finding weights, and simultaneously improving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cover T M, Hart P E. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967, 13: 21–27

    Article  MATH  Google Scholar 

  2. Aha D, Kibler D, Albert MK. Instance-based learning algorithms. Machine Learning, 1991, 6: 37–66

    Google Scholar 

  3. Domingos P. Rule induction and instance-based learning: a unified approach. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence. 1995, 1226–1232

    Google Scholar 

  4. Frank E, Hall M, Pfahringer B. Locally weighted naive bayes. In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence. 2003, 249–256

    Google Scholar 

  5. Mitchell T M. Machine Learning, 1st edition. McGraw-Hill, 1997

    MATH  Google Scholar 

  6. Tan P N, Steinbach M, Kumar V. Introduction to Data Mining, 1st edition. Pearson Education, 2006

    Google Scholar 

  7. Stanfill C, Waltz D. Toward memory-based reasoning. Communications of the ACM, 1986, 29: 1213–1228

    Article  Google Scholar 

  8. Short R D, Fukunaga K. The optimal distance measure for nearest neighbour classification. IEEE Transactions on Information Theory, 1981, 27: 622–627

    Article  MATH  MathSciNet  Google Scholar 

  9. Myles J P, Hand D J. The multi-class metric problem in nearest neighbour discrimination rules. Pattern Recognition, 1990, 23: 1291–1297

    Article  Google Scholar 

  10. Cost S, Salzberg S. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 1993, 10: 57–78

    Google Scholar 

  11. Wilson D R, Martinez T R. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 1997, 6: 1–34

    MATH  MathSciNet  Google Scholar 

  12. Li C, Li H. One dependence value difference metric. Knowledge-Based Systems, 2011, 24: 589–594

    Article  Google Scholar 

  13. Jiang L, Li C. An augmented value difference measure. Pattern Recognition Letters, 2013, 34: 1169–1174

    Article  Google Scholar 

  14. Derrac J, Triguero I, García S, Herrera F. Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Transactions On Systems, Man, and Cybernetics-Part B: Cybernetics, 2012, 42: 1383–1397

    Article  Google Scholar 

  15. García-Pedrajas N. Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions On Neural Network, 2009, 20: 258–277

    Article  Google Scholar 

  16. Jahromi M Z, Parvinnia E, John R. A method of learning weighted similarity function to improve the performance of nearest neighbor. Information Sciences, 2009, 179: 2964–2973

    Article  MATH  Google Scholar 

  17. Wang J, Neskovic P, Cooper L. Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognition Letters, 2007, 28: 207–213

    Article  Google Scholar 

  18. Kasif S, Salzberg S, Waltz D, Rachlin D J. A. A probabilistic framework for memory-based reasoning. Artificial Intelligence, 1998, 104: 287–311

    Article  MATH  MathSciNet  Google Scholar 

  19. Tian Y, Shi Y, Liu X. Recent advances on support vector machines research. Technological and Economic Development of Economy, 2012, 18: 5–33

    Article  Google Scholar 

  20. Qi Z, Tian Y, Shi Y. Laplacian twin support vector machine for semisupervised classification. Neural Networks, 2012, 35: 46–53

    Article  MATH  Google Scholar 

  21. Qi Z, Tian Y, Shi Y. Structural twin support vector machine for classification. Knowledge-Based Systems, 2013, 43: 74–81

    Article  Google Scholar 

  22. Saar-Tsechansky M, Provost F. Active sampling for class probability estimation and ranking. Machine Learning, 2004, 54: 153–178

    Article  MATH  Google Scholar 

  23. Jiang L, Zhang H. Learning naive bayes for probability estimation by feature selection. In: Proceedings of the 19th Canadian Conference on Artificial Intelligence. 2006, 503–514

    Google Scholar 

  24. Grossman D, Domingos P. Learning bayesian network classifiers by maximizing conditional likelihood. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 361–368

    Google Scholar 

  25. Lowd D, Domingos P. Naive bayes models for probability estimation. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 529–536

    Google Scholar 

  26. Hall M. A decision tree-based attribute weighting filter for naive bayes. Knowledge-Based Systems, 2007, 20: 120–126

    Article  Google Scholar 

  27. Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. San Francisco: Morgan Kaufmann, 2005

    Google Scholar 

  28. Wilson D, Martinez T. Reduction techniques for exemplar-based learning algorithms. Machine Learning, 2000, 38: 257–286

    Article  MATH  Google Scholar 

  29. Brighton H, Mellish C. Advances in instance selection for instancebased learning algorithms. Data Mining and Knowledge Discovery, 2002, 6: 153–172

    Article  MATH  MathSciNet  Google Scholar 

  30. Paredes R, Vidal E. Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recognition, 2006, 39: 180–188

    Article  MATH  Google Scholar 

  31. García S, Derrac J, Cano J R, Herrera F. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Transactions On Pattern Analaysis and Machine Intelligence, 2012, 34: 417–435

    Article  Google Scholar 

  32. Triguero I, Derrac J, García S, Herrera F. A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions On Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42: 86–100

    Article  Google Scholar 

  33. Wu X, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H, Mclachlan G J, Ng A, Liu B, Yu P S, Zhou Z H, Steinbach M, Hand D J, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14: 1–37

    Article  Google Scholar 

  34. Jiang L, Wang D, Cai Z. Discriminatively weighted naive bayes and its application in text classification. International Journal on Artificial Intelligence Tools, 2012, 21(1): 1250007:1-1250007:19

    Article  Google Scholar 

  35. Hall M A. Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning. 2000, 359–366

    Google Scholar 

  36. Jiang L, Zhang H, Cai Z. A novel bayes model: hidden naive bayes. IEEE Transactions on Knowledge and Data Engineering, 2009, 21: 1361–1371

    Article  Google Scholar 

  37. Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning, 2003, 52: 239–281

    Article  MATH  Google Scholar 

  38. Kurgan L A, Cios K J, Tadeusiewicz R, Ogiela M, Goodenday L S. Knowledge discovery approach to automated cardiac spect diagnosis. Artificial Intelligence in Medicine, 2001, 23: 149–169

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoqun Li.

Additional information

Chaoqun Li received her PhD from China University of Geosciences (Wuhan), China in June 2012. Currently, she is a lecturer in the Department ofMathematics, China University of Geosciences (Wuhan). Her research interests include data mining and machine learning. In these areas she has already published more than 10 papers in international journals such as Knowledge and Information Systems, Knowledge-Based Systems, International Journal of Pattern Recognition and Artificial Intelligence, Pattern Recognition Letters, and the Journal of Experimental & Theoretical Artificial Intelligence.

Liangxiao Jiang received his BS, MS, and PhD from China University of Geosciences (Wuhan), China in June 2001, 2004, and 2009 respectively. He joined the Department of Computer Science, China University of Geosciences (Wuhan) as an assistant in July 2004, and at present he is a professor. He has wide research interests, including data mining and machine learning.

Hongwei Li received his PhD in Applied Mathematics from Peking University, China, in 1996. From July 1996 to July 1998, he was a postdoctoral fellow with the Institute of Information Science, Beijing Jiaotong University, China. Since 1999, he has been a professor at the School of Mathematics and Physics, China University of Geosciences (Wuhan), China. His research interests include pattern recognition, statistical signal processing, blind signal processing, multidimensional signal processing, and time series analysis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Jiang, L. & Li, H. Naive Bayes for value difference metric. Front. Comput. Sci. 8, 255–264 (2014). https://doi.org/10.1007/s11704-014-3038-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-3038-5

Keywords

Navigation