Skip to main content
Log in

Possibilistic classifiers for numerical data

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Naive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ben Amor N, Mellouli K, Benferhat S, Dubois D, Prade H (2002) A theoretical framework for possibilistic independence in a weakly ordered setting. Int J Uncertain Fuzziness Knowledge-Based Syst 10:117–155

    MathSciNet  MATH  Google Scholar 

  • Ben Amor N, Benferhat S, Elouedi Z (2004) Qualitative classification and evaluation in possibilistic decision trees. In: FUZZ-IEEE’04, vol 1, pp 653–657

  • Benferhat S, Tabia K (2008) An efficient algorithm for naive possibilistic classifiers with uncertain inputs. In: Proceedings of 2nd international conference on scalable uncertainty management (SUM’08). LNAI, vol 5291. Springer, Berlin, pp 63–77

  • Beringer J, Hüllermeier E (2008) Case-based learning in a bipolar possibilistic framework. Int J Intell Syst 23:1119–1134

    Article  MATH  Google Scholar 

  • Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, New York

  • Bishop CM (1999) Latent variable models. In: Learning in graphical models, pp 371–403

  • Borgelt C, Gebhardt J (1999) A naïve bayes style possibilistic classifier. In: Proceedings of 7th European congress on intelligent techniques and soft computing, pp 556–565

  • Borgelt C, Kruse R (1988) Efficient maximum projection of database-induced multivariate possibility distributions. In: Proceedings of 7th IEEE international conference on fuzzy systems, pp 663–668

  • Bounhas M, Mellouli K (2010) A possibilistic classification approach to handle continuous data. In: Proceedings of the eighth ACS/IEEE international conference on computer systems and applications (AICCSA-10), pp 1–8

  • Bounhas M, Mellouli K, Prade H, Serrurier M (2010) From bayesian classifiers to possibilistic classifiers for numerical data. In: Proceedings of the fourth international conference on scalable uncertainty management, pp 112–125

  • Bounhas M, Prade H, Serrurier M, Mellouli K (2011) Possibilistic classifiers for uncertain numerical data. In: Proceedings of 11th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU’11), Belfast, UK, June 29–July 1. LNCS, vol 6717. Springer, Berlin, pp 434–446

  • Cheng J, Greiner R (1999) Comparing bayesian network classifiers. In: Proceedings of the 15th conference on uncertainty in artificial intelligence, pp 101–107

  • Cover TM, Hart PE (1967) Nearest neighbour pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  • De Cooman G (1997) Possibility theory. Part I: measure- and integral-theoretic ground- work. Part II: conditional possibility; Part III: possibilistic independence. Int J Gen Syst 25:291–371

    Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Denton A, Perrizo W (2004) A kernel-based semi-naive Bayesian classifier using p-trees. In: Proceedings of the 4th SIAM international conference on data mining

  • Devroye L (1983) The equivalence of weak, strong, and complete convergence in l1 for kernel density estimates. Ann Stat 11:896–904

    Google Scholar 

  • Domingos P, Pazzani M (2002) Beyond independence: conditions for the optimality of the simple bayesian classifier. Mach Learn 29:102–130

    Google Scholar 

  • Dubois D (2006) Possibility theory and statistical reasoning. Comput Stat Data Anal 51:47–69

    Article  MATH  Google Scholar 

  • Dubois D, Prade H (1988) Possibility theory: an approach to computerized processing of uncertainty

  • Dubois D, Prade H (1990) Aggregation of possibility measures. In: Multiperson decision making using fuzzy sets and possibility theory, pp 55–63

  • Dubois D, Prade H (1990) The logical view of conditioning and its application to possibility and evidence theories. Int J Approx Reason 4:23–46

    Article  MathSciNet  MATH  Google Scholar 

  • Dubois D, Prade H (1992) When upper probabilities are possibility measures. Fuzzy Sets Syst 49:65–74

    Article  MathSciNet  MATH  Google Scholar 

  • Dubois D, Prade H (1993) On data summarization with fuzzy sets. In: Proceedings of the 5th international fuzzy systems association. World Congress (IFSA’93)

  • Dubois D, Prade H (1998) Possibility theory: qualitative and quantitative aspects. In: Gabbay D, Smets P (eds) Handbook on defeasible reasoning and uncertainty management systems, vol 1, pp 169–226

  • Dubois D, Prade H (2000) An overview of ordinal and numerical approaches to causal diagnostic problem solving. In: Gabbay DM, Kruse R (eds) Abductive reasoning and learning, handbooks of defeasible reasoning and uncertainty management systems, drums handbooks, vol 4, pp 231–280

  • Dubois D, Prade H (2009) Formal representations of uncertainty. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Decision-making—concepts and methods, pp 85–156

  • Dubois D, Prade H, Sandri S (1993) On possibility/probability transformations. Fuzzy Logic, pp 103–112

  • Dubois D, Laurent F, Gilles M, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable Comput 10:273–297

    Article  MATH  Google Scholar 

  • Figueiredo M, Leitao JMN (1999) On fitting mixture models. In: Energy minimization methods in computer vision and pattern recognition, vol 1654, pp 732–749

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–161

    Article  MATH  Google Scholar 

  • Geiger D, Heckerman D. (1994) Learning gaussian networks. Technical report, Microsoft Research, Advanced Technology Division

  • Grossman D, Dominigos P (2004) Learning Bayesian maximizing conditional likelihood. In: Proceedings on machine learning, pp 46–57

  • Haouari B, Ben Amor N, Elouadi Z, Mellouli K (2009) Naive possibilistic network classifiers. Fuzzy Sets Syst 160(22):3224–3238

    Article  MATH  Google Scholar 

  • Hüllermeier E (2003) Possibilistic instance-based learning. Artif Intell 148(1–2):335–383

    Article  MATH  Google Scholar 

  • Hüllermeier E (2005) Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 156(3):387–406

    Article  Google Scholar 

  • Jenhani I, Ben Amor N, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807

    Article  Google Scholar 

  • John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th conference on uncertainty in artificial intelligence

  • Kononenko I (1991) Semi-naive bayesian classifier. In: Proceedings of the European working session on machine learning, pp 206–219

  • Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268

    MathSciNet  MATH  Google Scholar 

  • Langley P, Sage S (1994) Induction of selective bayesian classifiers. In: Proceedings of 10th conference on uncertainty in artificial intelligence (UAI-94), pp 399–406

  • Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: Proceedings of AAAI-92, vol 7, pp 223–228

  • McLachlan GJ, Peel D (2000) Finite mixture models. Probability and mathematical statistics. Wiley, New York

  • Mertz J, Murphy PM (2000) Uci repository of machine learning databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmman, San Francisco

  • Pérez A, Larraoaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50:341–362

    Article  MATH  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  • Sahami M (1996) Learning limited dependence bayesian classifiers. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 335–338

  • Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

  • Solomonoff R (1964) A formal theory of inductive inference. Inf Control 7:224–254

    Article  MathSciNet  MATH  Google Scholar 

  • Strauss O, Comby F, Aldon MJ (2000) Rough histograms for robust statistics. In: Proceedings of international conference on pattern recognition (ICPR’00), vol II, Barcelona. IEEE Computer Society, pp 2684–2687

  • Sudkamp T (2000) Similarity as a foundation for possibility. In: Proceedings of 9th IEEE international conference on fuzzy systems, San Antonio, pp 735–740

  • Yamada K (2001) Probability-possibility transformation based on evidence theory. In: Joint 9th IFSA World Congress and 20th NAFIPS international conference 2001, pp 70–75

  • Yang Y, Webb GI (2003) Discretization for naive-bayes learning: managing discretization bias and variance. Technical Report 2003-131

  • Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang H (2004) The optimality of naive bayes. In: Proceedings of 17th international FLAIRS conference (FLAIRS2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myriam Bounhas.

Additional information

Communicated by E. Huellermeier.

Appendix: Naive Bayesian Classifiers

Appendix: Naive Bayesian Classifiers

Naive Bayesian Classifiers (NBC) are based on Bayes rule. They assume the independence of the input variables. Despite their simplicity, NBC can often outperform more sophisticated classification methods (Langley et al. 1992). A NBC can be seen as a Bayesian network in which predictive attributes are assumed to be conditionally independent given the class attribute.

Given a vector X = {x 1x 2, …, x n } to be classified, a NBC computes the posterior probability P(c j |X) for each class c j in a set of possible classes C = (c 1c 2, …, c m ) and labels the case X with the class c j that achieves the highest posterior probability, that is:

$$ c^{\ast}=\hbox{arg} \max_{c_{j}} P(c_{j}|X). $$
(21)

Using the Bayes rule:

$$ P(c_j|x_1,x_2,\ldots,x_n)=\frac{P(x_1,x_2,\ldots,x_n|c_j)*P(c_j)}{P(x_1,x_2,\ldots,x_n)} $$
(22)

The denominator P(x 1x 2, , x n ) is a normalizing factor that can be ignored when determining the maximum posterior probability of a class, as it does not depend on the class. The key term in Eq. (2) is P(x 1x 2, , x n |c j ) which is estimated from training data. Since Naive Bayes assumes that conditional probabilities of attributes are statistically independent we can decompose the likelihood into a product of terms:

$$ P(x_1,x_2,\ldots,x_n|c_j)=\prod_{i=1}^{n}p(x_i|c_j) $$
(23)

Even under the independence assumption, the NBC have shown good performance for datasets containing dependent attributes. Domingos and Pazzani (2002) explain that attribute dependency does not strongly affect the classification accuracy. They also relate good performance of NBC to the zero-one loss function which considers that a classifier is successful when the maximum probability is assigned to the correct class (even if estimated probability is inaccurate). The work in Zhang (2004) gives a deeper explanation about the reasons for which the efficiency of NBC is not affected by attribute dependency. The author shows that, even if attributes are strongly dependent (if we look at each pairs of attributes), the global dependencies among all attributes could be insignificant because dependencies may cancel each other out and so they do not affect classification.

The most well-known Bayesian classification approach uses an estimation based on a multinomial distribution over the discretized variables and leads to so-called multinomial classifiers. Such a classifier, which handles only discrete attributes (continuous attributes must be discretized), assumes that all attributes follow a multinomial probability distribution. A variety of multinomial classifiers have been proposed for handling an arbitrary number of independent attributes. Let us mention especially (Langley et al. 1992; Langley and Sage 1994; Grossman and Dominigos 2004), semi-naive Bayesian classifiers (Kononenko 1991; Denton and Perrizo 2004), tree-augmented naive Bayesian classifiers (Friedman et al. 1997), k-dependence Bayesian classifiers (Sahami 1996) and Bayesian Network-augmented naive Bayesian classifiers (Cheng and Greiner 1999).

A second family of NBC is suitable for continuous attribute values. They directly estimate the true density of attributes using parametric density. A supplementary common assumption made by the NBC in that case is that within each class the values of numeric attributes are normally distributed around the mean, and they model each attribute through a single Gaussian. Then, the NBC represent such a distribution in terms of its mean and standard deviation and compute the probability of an observed value from such estimates. This probability is calculated as follows:

$$ p(x_{i}|c_{j})=g(x_{i}, \mu_{j}, \sigma_{j})=\frac{1}{\sqrt{2\Uppi}\sigma_{j}}\hbox{e}^{-\frac{(x_{i}- \mu_{j})^{2}}{2\sigma_{j}^2}} $$
(24)

The Gaussian classifiers (Geiger and Heckerman 1994; John and Langley 1995) are known for their simplicity and have a smaller complexity, compared with other non-parametric approximations. Although the normality assumption may be a valuable approximation for many benchmarks, it is not always the best estimation. Moreover, if the normality assumption is violated, classification results of NBC may deteriorate.

Other approaches using a non-parametric estimation are those breaking with the strong parametric assumption. The main approaches are based on the mixture model (Figueiredo and Leit ao 1999; McLachlan and Peel 2000) and the Gaussian mixture models (Bishop 1999; McLachlan and Peel 2000). Other approaches use kernel densities (John and Langley 1995; Pérez et al. 2009), leading to so-called Flexible Classifiers. This name is due to the ability of such classifier to represent densities with more than one mode in contrast with simple Gaussian classifiers. Flexible classifiers represent densities of different shapes with high accuracy; however, it results into a considerable increase in complexity.

John and Langley (1995) have proposed a Flexible Naive Bayesian Classifier (FNBC) that abandons the normality assumption and instead uses nonparametric kernel density estimation for each conditional distribution. The FNBC has the same properties as those introduced for the NBC; the only difference is instead of estimating the density for each continuous attribute x by a single Gaussian g(x,  μ j , σ j ), this density is estimated using an averaged large set of Gaussian kernels. To compute continuous attribute density for a specific class j, FNBC calculates n Gaussians, where each of them stores each attribute value encountered during training for this class and then takes the average of the n Gaussians to estimate p(x i |c j ). More formally, probability distribution is estimated as follows:

$$ p(x_{i}|c_{j})=\frac{1}{N_{j}}\sum _{k=1}^{N_{j}} g(x_{i}, \mu_{ik}, \sigma_{j}) $$
(25)

where k ranges over the training set of attribute x i in class c j N j is the number of instances belonging to the class c j . The mean μ ik is equal to the real value of attribute i of the instance k belonging to the class j, e.g. μ ik  = x ik . For each class j, FNBC estimates this standard deviation by

$$ \sigma_j=\frac{1}{\sqrt{N_j}} $$
(26)

The authors also prove kernel estimation consistency using (26) (see John and Langley 1995, for details). It has been shown that the kernel density estimation used in the FNBC and applied on several datasets enables this classifier to perform well in datasets where the parametric assumption is violated with little cost for datasets where it holds.

Pérez et al. (2009) have recently proposed a new approach for Flexible Bayesian classifiers based on kernel density estimation that extends the FNBC proposed by John and Langley (1995) to handle dependent attributes and abandons the independence assumption. In this work, three classifiers: tree-augmented naive Bays, a k-dependence Bayesian classifier and a complete graph are adapted to the support kernel Bayesian network paradigm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bounhas, M., Mellouli, K., Prade, H. et al. Possibilistic classifiers for numerical data. Soft Comput 17, 733–751 (2013). https://doi.org/10.1007/s00500-012-0947-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0947-9

Keywords

Navigation