Pattern Analysis and Applications

, Volume 18, Issue 2, pp 225–246 | Cite as

Directional naive Bayes classifiers

  • Pedro L. López-Cruz
  • Concha Bielza
  • Pedro Larrañaga
Theoretical Advances

Abstract

Directional data are ubiquitous in science. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. We extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are then evaluated over eight datasets, showing competitive performances against other naive Bayes classifiers that use Gaussian distributions or discretization to manage directional data.

Keywords

Supervised classification Naive Bayes classifier Directional statistics von Mises distribution von Mises–Fisher distribution 

References

  1. 1.
    Agresti A (2007) An introduction to categorical data analysis, 2nd edn. Wiley, New YorkGoogle Scholar
  2. 2.
    Amayri O, Bouguila N. (2013) Beyond hybrid generative discriminative learning: spherical data classification. Pattern Anal Appl, in pressGoogle Scholar
  3. 3.
    Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382MATHMathSciNetGoogle Scholar
  4. 4.
    Berens P (2009) CircStat: a MATLAB toolbox for circular statistics. J Stat Softw 31(10):1–21MathSciNetGoogle Scholar
  5. 5.
    Berkholz DS, Krenesky PB, Davidson JR, Karplus PA (2010) Protein geometry database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic Acids Res 38(Suppl 1):D320–D325CrossRefGoogle Scholar
  6. 6.
    Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151CrossRefGoogle Scholar
  7. 7.
    Bock RK, Chilingarian A, Gaug M, Hakl F, Hengstebeck T, Jiřina M, Klaschka J, Kotrč E, Savický P, Towers S, Vaiciulis A, Wittek W (2004) Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl Instrum Methods in Phys Res Sect A-Accel Spectrom Detect Assoc Equip 516(2–3):511–528CrossRefGoogle Scholar
  8. 8.
    Bøttcher SG (2004) Learning Bayesian networks with mixed variables. PhD thesis, Aalborg UniversityGoogle Scholar
  9. 9.
    Bouckaert RR (2004) Estimating replicability of classifier learning experiments. In: Brodley CE (ed) Proceedings of the 21st international conference on machine learning, ACMGoogle Scholar
  10. 10.
    Damien P, Walker S (1999) A full Bayesian analysis of circular data using the von Mises distribution. Can J Stat Rev Can Stat 27(2):291–298CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    deHaas–Lorentz GL (1913) Die Brownsche Bewegung und einige verwandte Erscheinungen. Friedr. Vieweg und SohnGoogle Scholar
  12. 12.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNetGoogle Scholar
  13. 13.
    Devlaminck D, Waegeman W, Bauwens B, Wyns B, Santens P, Otte G (2010) From circular ordinal regression to multilabel classification. In: Proceedings of the 2010 workshop on preference learning, European conference on machine learningGoogle Scholar
  14. 14.
    Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero–one loss. Mach Learn 29:103–130CrossRefMATHGoogle Scholar
  15. 15.
    Downs TD (2003) Spherical regression. Biometrika 90(3):655–668CrossRefMathSciNetGoogle Scholar
  16. 16.
    Downs TD, Mardia KV (2002) Circular regression. Biometrika 89(3):683–697CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkGoogle Scholar
  18. 18.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkGoogle Scholar
  19. 19.
    Eben K (1983) Classification into two von Mises distributions with unknown mean directions. Aplikace Matematiky 28(3):230–237MATHMathSciNetGoogle Scholar
  20. 20.
    El Khattabi S, Streit F (1996) Identification analysis in directional statistics. Comput Stat Data Anal 23:45–63CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajcsy R (ed) Proceedings of the 13th international joint conference on artificial intelligence, Morgan Kaufmann, San Mateo, pp 1022–1027Google Scholar
  22. 22.
    Figueiredo A (2009) Discriminant analysis for the von Mises–Fisher distribution. Commun Stat Simul Comput 38(9):1991–2003CrossRefMATHMathSciNetGoogle Scholar
  23. 23.
    Figueiredo A, Gomes P (2006) Discriminant analysis based on the Watson distribution defined on the hypersphere. Stat: J Theor Appl Stat 40(5):435–445CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Fisher NI (1987) Statistical analysis of spherical data. Cambridge University Press, CambridgeGoogle Scholar
  25. 25.
    Fisher NI (1993) Statistical analysis of circular data. Cambridge University Press, CambridgeGoogle Scholar
  26. 26.
    Fisher NI, Lee AJ (1992) Regression models for an angular response. Biometrics 48:665–677CrossRefMathSciNetGoogle Scholar
  27. 27.
    Fisher RA (1953) Dispersion on a sphere. Proc R Soc Lond Ser A Math Phys Sci 217:295–305CrossRefMATHGoogle Scholar
  28. 28.
    Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
  29. 29.
    Frank E, Trigg L, Holmes G, Witten IH (2000) Technical note: naive Bayes for regression. Mach Learn 41(1):5–25CrossRefGoogle Scholar
  30. 30.
    Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163CrossRefMATHGoogle Scholar
  31. 31.
    García S, Herrera F (2008) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATHGoogle Scholar
  32. 32.
    Guttorp P, Lockhart RA (1988) Finding the location of a signal: a Bayesian analysis. J Am Stat Soc 83:322–330CrossRefMathSciNetGoogle Scholar
  33. 33.
    Güvenir HA, Acar B, Demiröz G, Çekin A (1997) A supervised machine learning algorithm for arrhythmia analysis. In: Murray A, Swiryn S (eds) Computers in cardiology 1997, pp 433–436Google Scholar
  34. 34.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorations 11(1)Google Scholar
  35. 35.
    Hornik K, Grün B (2013) On conjugate families and Jeffreys priors for von Mises–Fisher distributions. J Stat Plan Infer 143(5):992–999CrossRefMATHGoogle Scholar
  36. 36.
    Jaakola TS (1997) Variational methods for inference and estimation in graphical models. PhD thesis, Department of Brain and Cognitive Sciences, Massachusetts Institute of TechnologyGoogle Scholar
  37. 37.
    Jammalamadaka SR, SenGupta A (2001) Topics in circular statistics. World Scientific, SingaporeGoogle Scholar
  38. 38.
    Johnson RA, Wehrly TE (1978) Some angular-linear distributions and related regression models. J Am Stat Assoc 73(363):602–606CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Jossinet J (1996) Variability of impedivity in normal and pathological breast tissue. Med Biol Eng Comput 34(5):346–350CrossRefGoogle Scholar
  40. 40.
    Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, School of Computer Science and Engineering, University of New South WalesGoogle Scholar
  41. 41.
    Kato S, Shimizu K, Shieh G (2008) A circular-circular regression model. Stat Sin 18(2):633–645MATHMathSciNetGoogle Scholar
  42. 42.
    Koller D, Friedman N (2009) Probabilistic graphical models. Principles and techniques. The MIT Press, BostonGoogle Scholar
  43. 43.
    Kovach WL (1989) Quantitative methods for the study of lycopod megaspore ultrastructure. Rev Palaeobot Palynol 57(3–4):233–246CrossRefGoogle Scholar
  44. 44.
    Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: López de Mántaras R, Poole D (eds) Proceedings of the 10th conference on uncertainty in artificial intelligence, Morgan Kaufmann, San Mateo, pp 399–406Google Scholar
  45. 45.
    Lévy MP (1939) L’addition des variables aléatoires définies sur une circonférence. Bull Soc Math Fr 67:1–41Google Scholar
  46. 46.
    López-Cruz PL, Bielza C, Larranaga P (2011) The von Mises naive Bayes classifier for angular data. In: Proceedings of the 14th conference of the Spanish Association for Artificial Intelligence, CAEPIA 2011, LNCS 7023, pp 145–154Google Scholar
  47. 47.
    Mardia KV (1975) Statistics of directional data. J R Stat Soc Ser B Stat Methodol 37(3):349–393MATHMathSciNetGoogle Scholar
  48. 48.
    Mardia KV (2006) On some recent advancements in applied shape analysis and directional statistics. In: Barber S, Baxter PD, Mardia KV (eds) Systems biology and statistical bioinformatics, Leeds University Press, Leeds, pp 9–17Google Scholar
  49. 49.
    Mardia KV (2010) Bayesian analysis for bivariate von Mises distributions. J Appl Stat 37(3):515–528CrossRefMathSciNetGoogle Scholar
  50. 50.
    Mardia KV, Jupp PE (2000) Circular statistics. Wiley, New YorkGoogle Scholar
  51. 51.
    Minsky M (1961) Steps toward artificial intelligence. Proc Inst Radio Eng 49:8–30MathSciNetGoogle Scholar
  52. 52.
    von Mises R (1918) Uber die “Ganzzahligkeit” der Atomgewichte und verwandte Fragen. Phys Z 19:490–500MATHGoogle Scholar
  53. 53.
    Mooney JA, Helms PJ, Jolliffe IT (2003) Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome. Comput Stat Data Anal 41(3–4):505–513CrossRefMATHMathSciNetGoogle Scholar
  54. 54.
    Morales M, Rodríguez C, Salmerón A (2007) Selective naive Bayes for regression based on mixtures of truncated exponentials. Int J Uncertainty Fuzziness Knowl Based Syst 15(6):697–716CrossRefMATHGoogle Scholar
  55. 55.
    Morris JE, Laycock PJ (1974) Discriminant analysis of directional data. Biometrika 61(2):335–341CrossRefMATHMathSciNetGoogle Scholar
  56. 56.
    Pazzani MJ (1995) Searching for dependencies in Bayesian classifiers. In: Fisher D, Lenz HJ (eds) Learning from Data: Artificial Intelligence and Statistics V. In: Proceedings of the 5th International Workshop on Artificial Intelligence and Statistics, Springer, pp 239–248Google Scholar
  57. 57.
    Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San MateoGoogle Scholar
  58. 58.
    Peot MA (1996) Geometric implications of the naive Bayes assumption. In: Horvitz E, Jensen FV (eds) Proceedings of the 12th conference on uncertainty in artificial intelligence, Morgan Kaufmann, San Mateo, pp 414–419Google Scholar
  59. 59.
    Pérez A, Larrañaga P, Inza I (2006) Supervised classification with conditional Gaussian networks: increasing the structure complexity from naive Bayes. Int J Approx Reason 43:1–25CrossRefMATHGoogle Scholar
  60. 60.
    Perrin F (1928) Étude mathématique du mouvement Brownien de rotation. Ann Sci Ec Norm Super 45:1–51MATHMathSciNetGoogle Scholar
  61. 61.
    Rivest LP, Chang T (2006) Regression and correlation for 3 × 3 rotation matrices. Can J Stat Rev Can Stat 34(2):187–202CrossRefMATHMathSciNetGoogle Scholar
  62. 62.
    Romero V, Rumí R, Salmerón A (2006) Learning hybrid Bayesian networks using mixtures of truncated exponentials. Int J Approx Reason 42:54–68CrossRefMATHGoogle Scholar
  63. 63.
    Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Simoudis E, Han J, Fayyad UM (eds) Proceedings of the 2nd international conference on knowledge discovery and data mining, AAAI Press, pp 335–338Google Scholar
  64. 64.
    SenGupta A, Roy S (2005) A simple classification rule for directional data. In: Balakrishnan N, Nagaraja HN, Kannan N (eds) Advances in ranking and selection, multiple comparisons, and reliability, statistics for industry and technology, Birkhäuser, Boston, pp 81–90Google Scholar
  65. 65.
    SenGupta A, Ugwuowo FI (2011) A classification method for directional data with application to the human skull. Commun Stat Theory Methods 40:457–466CrossRefMATHMathSciNetGoogle Scholar
  66. 66.
    Shenoy PP, West JC (2011) Inference in hybrid Bayesian networks using mixtures of polynomials. Int J Approx Reason 52(5):641–657CrossRefMATHMathSciNetGoogle Scholar
  67. 67.
    da Silva JE, Marques de Sá J, Jossinet J (2000) Classification of breast tissue by electrical impedance spectroscopy. Med Biol Eng Comput 38(1):26–30CrossRefGoogle Scholar
  68. 68.
    Sra S (2012) A short note on parameter approximation for von Mises–Fisher distributions: and a fast implementation of I s(x). Comput Stat 27(1):177–190CrossRefMATHMathSciNetGoogle Scholar
  69. 69.
    Wood AT (1994) Simulation of the von Mises–Fisher distribution. Commun Stat Simul Comput 23(1):157–164CrossRefMATHGoogle Scholar
  70. 70.
    Zemel RS, Williams CKI, Mozer MC (1995) Lending direction to neural networks. Neural Netw 8(4):503–512CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Pedro L. López-Cruz
    • 1
  • Concha Bielza
    • 1
  • Pedro Larrañaga
    • 1
  1. 1.Computational Intelligence Group, Departamento de Inteligencia Artificial, Facultad de InformáticaUniversidad Politécnica de Madrid, Campus de Montegancedo sn Boadilla del Monte, MadridSpain

Personalised recommendations