Possibilistic classifiers for numerical data

Bounhas, Myriam; Mellouli, Khaled; Prade, Henri; Serrurier, Mathieu

doi:10.1007/s00500-012-0947-9

Possibilistic classifiers for numerical data

Methodologies and Application
Published: 06 November 2012

Volume 17, pages 733–751, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

Myriam Bounhas¹,
Khaled Mellouli¹,
Henri Prade² &
…
Mathieu Serrurier²

452 Accesses
29 Citations
Explore all metrics

Abstract

Naive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Modified Naïve Possibilistic Classifier for Numerical Data

Decision Quality Enhancement in Minimum-Based Possibilistic Classification for Numerical Data

On High Dimensional Searching Spaces and Learning Methods

References

Ben Amor N, Mellouli K, Benferhat S, Dubois D, Prade H (2002) A theoretical framework for possibilistic independence in a weakly ordered setting. Int J Uncertain Fuzziness Knowledge-Based Syst 10:117–155
MathSciNet MATH Google Scholar
Ben Amor N, Benferhat S, Elouedi Z (2004) Qualitative classification and evaluation in possibilistic decision trees. In: FUZZ-IEEE’04, vol 1, pp 653–657
Benferhat S, Tabia K (2008) An efficient algorithm for naive possibilistic classifiers with uncertain inputs. In: Proceedings of 2nd international conference on scalable uncertainty management (SUM’08). LNAI, vol 5291. Springer, Berlin, pp 63–77
Beringer J, Hüllermeier E (2008) Case-based learning in a bipolar possibilistic framework. Int J Intell Syst 23:1119–1134
Article MATH Google Scholar
Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, New York
Bishop CM (1999) Latent variable models. In: Learning in graphical models, pp 371–403
Borgelt C, Gebhardt J (1999) A naïve bayes style possibilistic classifier. In: Proceedings of 7th European congress on intelligent techniques and soft computing, pp 556–565
Borgelt C, Kruse R (1988) Efficient maximum projection of database-induced multivariate possibility distributions. In: Proceedings of 7th IEEE international conference on fuzzy systems, pp 663–668
Bounhas M, Mellouli K (2010) A possibilistic classification approach to handle continuous data. In: Proceedings of the eighth ACS/IEEE international conference on computer systems and applications (AICCSA-10), pp 1–8
Bounhas M, Mellouli K, Prade H, Serrurier M (2010) From bayesian classifiers to possibilistic classifiers for numerical data. In: Proceedings of the fourth international conference on scalable uncertainty management, pp 112–125
Bounhas M, Prade H, Serrurier M, Mellouli K (2011) Possibilistic classifiers for uncertain numerical data. In: Proceedings of 11th European conference on symbolic and quantitative approaches to reasoning with uncertainty (ECSQARU’11), Belfast, UK, June 29–July 1. LNCS, vol 6717. Springer, Berlin, pp 434–446
Cheng J, Greiner R (1999) Comparing bayesian network classifiers. In: Proceedings of the 15th conference on uncertainty in artificial intelligence, pp 101–107
Cover TM, Hart PE (1967) Nearest neighbour pattern classification. IEEE Trans Inf Theory 13:21–27
Article MATH Google Scholar
De Cooman G (1997) Possibility theory. Part I: measure- and integral-theoretic ground- work. Part II: conditional possibility; Part III: possibilistic independence. Int J Gen Syst 25:291–371
Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Denton A, Perrizo W (2004) A kernel-based semi-naive Bayesian classifier using p-trees. In: Proceedings of the 4th SIAM international conference on data mining
Devroye L (1983) The equivalence of weak, strong, and complete convergence in l1 for kernel density estimates. Ann Stat 11:896–904
Google Scholar
Domingos P, Pazzani M (2002) Beyond independence: conditions for the optimality of the simple bayesian classifier. Mach Learn 29:102–130
Google Scholar
Dubois D (2006) Possibility theory and statistical reasoning. Comput Stat Data Anal 51:47–69
Article MATH Google Scholar
Dubois D, Prade H (1988) Possibility theory: an approach to computerized processing of uncertainty
Dubois D, Prade H (1990) Aggregation of possibility measures. In: Multiperson decision making using fuzzy sets and possibility theory, pp 55–63
Dubois D, Prade H (1990) The logical view of conditioning and its application to possibility and evidence theories. Int J Approx Reason 4:23–46
Article MathSciNet MATH Google Scholar
Dubois D, Prade H (1992) When upper probabilities are possibility measures. Fuzzy Sets Syst 49:65–74
Article MathSciNet MATH Google Scholar
Dubois D, Prade H (1993) On data summarization with fuzzy sets. In: Proceedings of the 5th international fuzzy systems association. World Congress (IFSA’93)
Dubois D, Prade H (1998) Possibility theory: qualitative and quantitative aspects. In: Gabbay D, Smets P (eds) Handbook on defeasible reasoning and uncertainty management systems, vol 1, pp 169–226
Dubois D, Prade H (2000) An overview of ordinal and numerical approaches to causal diagnostic problem solving. In: Gabbay DM, Kruse R (eds) Abductive reasoning and learning, handbooks of defeasible reasoning and uncertainty management systems, drums handbooks, vol 4, pp 231–280
Dubois D, Prade H (2009) Formal representations of uncertainty. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Decision-making—concepts and methods, pp 85–156
Dubois D, Prade H, Sandri S (1993) On possibility/probability transformations. Fuzzy Logic, pp 103–112
Dubois D, Laurent F, Gilles M, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable Comput 10:273–297
Article MATH Google Scholar
Figueiredo M, Leitao JMN (1999) On fitting mixture models. In: Energy minimization methods in computer vision and pattern recognition, vol 1654, pp 732–749
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–161
Article MATH Google Scholar
Geiger D, Heckerman D. (1994) Learning gaussian networks. Technical report, Microsoft Research, Advanced Technology Division
Grossman D, Dominigos P (2004) Learning Bayesian maximizing conditional likelihood. In: Proceedings on machine learning, pp 46–57
Haouari B, Ben Amor N, Elouadi Z, Mellouli K (2009) Naive possibilistic network classifiers. Fuzzy Sets Syst 160(22):3224–3238
Article MATH Google Scholar
Hüllermeier E (2003) Possibilistic instance-based learning. Artif Intell 148(1–2):335–383
Article MATH Google Scholar
Hüllermeier E (2005) Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 156(3):387–406
Article Google Scholar
Jenhani I, Ben Amor N, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807
Article Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th conference on uncertainty in artificial intelligence
Kononenko I (1991) Semi-naive bayesian classifier. In: Proceedings of the European working session on machine learning, pp 206–219
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
MathSciNet MATH Google Scholar
Langley P, Sage S (1994) Induction of selective bayesian classifiers. In: Proceedings of 10th conference on uncertainty in artificial intelligence (UAI-94), pp 399–406
Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: Proceedings of AAAI-92, vol 7, pp 223–228
McLachlan GJ, Peel D (2000) Finite mixture models. Probability and mathematical statistics. Wiley, New York
Mertz J, Murphy PM (2000) Uci repository of machine learning databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmman, San Francisco
Pérez A, Larraoaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50:341–362
Article MATH Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Sahami M (1996) Learning limited dependence bayesian classifiers. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 335–338
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Solomonoff R (1964) A formal theory of inductive inference. Inf Control 7:224–254
Article MathSciNet MATH Google Scholar
Strauss O, Comby F, Aldon MJ (2000) Rough histograms for robust statistics. In: Proceedings of international conference on pattern recognition (ICPR’00), vol II, Barcelona. IEEE Computer Society, pp 2684–2687
Sudkamp T (2000) Similarity as a foundation for possibility. In: Proceedings of 9th IEEE international conference on fuzzy systems, San Antonio, pp 735–740
Yamada K (2001) Probability-possibility transformation based on evidence theory. In: Joint 9th IFSA World Congress and 20th NAFIPS international conference 2001, pp 70–75
Yang Y, Webb GI (2003) Discretization for naive-bayes learning: managing discretization bias and variance. Technical Report 2003-131
Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28
Article MathSciNet MATH Google Scholar
Zhang H (2004) The optimality of naive bayes. In: Proceedings of 17th international FLAIRS conference (FLAIRS2004)

Download references

Author information

Authors and Affiliations

Laboratoire LARODEC, ISG de Tunis, 41 rue de la liberté, 2000, Le Bardo, Tunisia
Myriam Bounhas & Khaled Mellouli
Institut de Recherche en Informatique de Toulouse (IRIT), UPS-CNRS, 118 route de Narbonne, 31062, Toulouse Cedex, France
Henri Prade & Mathieu Serrurier

Authors

Myriam Bounhas
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Mellouli
View author publications
You can also search for this author in PubMed Google Scholar
Henri Prade
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Serrurier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myriam Bounhas.

Additional information

Communicated by E. Huellermeier.

Appendix: Naive Bayesian Classifiers

Naive Bayesian Classifiers (NBC) are based on Bayes rule. They assume the independence of the input variables. Despite their simplicity, NBC can often outperform more sophisticated classification methods (Langley et al. 1992). A NBC can be seen as a Bayesian network in which predictive attributes are assumed to be conditionally independent given the class attribute.

Given a vector X = {x ₁, x ₂, …, x _n} to be classified, a NBC computes the posterior probability P(c _j|X) for each class c _j in a set of possible classes C = (c ₁, c ₂, …, c _m) and labels the case X with the class c _j that achieves the highest posterior probability, that is:

$$ c^{\ast}=\hbox{arg} \max_{c_{j}} P(c_{j}|X). $$

(21)

Using the Bayes rule:

$$ P(c_j|x_1,x_2,\ldots,x_n)=\frac{P(x_1,x_2,\ldots,x_n|c_j)*P(c_j)}{P(x_1,x_2,\ldots,x_n)} $$

(22)

The denominator P(x ₁, x ₂, , x _n) is a normalizing factor that can be ignored when determining the maximum posterior probability of a class, as it does not depend on the class. The key term in Eq. (2) is P(x ₁, x ₂, , x _n|c _j) which is estimated from training data. Since Naive Bayes assumes that conditional probabilities of attributes are statistically independent we can decompose the likelihood into a product of terms:

$$ P(x_1,x_2,\ldots,x_n|c_j)=\prod_{i=1}^{n}p(x_i|c_j) $$

(23)

Even under the independence assumption, the NBC have shown good performance for datasets containing dependent attributes. Domingos and Pazzani (2002) explain that attribute dependency does not strongly affect the classification accuracy. They also relate good performance of NBC to the zero-one loss function which considers that a classifier is successful when the maximum probability is assigned to the correct class (even if estimated probability is inaccurate). The work in Zhang (2004) gives a deeper explanation about the reasons for which the efficiency of NBC is not affected by attribute dependency. The author shows that, even if attributes are strongly dependent (if we look at each pairs of attributes), the global dependencies among all attributes could be insignificant because dependencies may cancel each other out and so they do not affect classification.

The most well-known Bayesian classification approach uses an estimation based on a multinomial distribution over the discretized variables and leads to so-called multinomial classifiers. Such a classifier, which handles only discrete attributes (continuous attributes must be discretized), assumes that all attributes follow a multinomial probability distribution. A variety of multinomial classifiers have been proposed for handling an arbitrary number of independent attributes. Let us mention especially (Langley et al. 1992; Langley and Sage 1994; Grossman and Dominigos 2004), semi-naive Bayesian classifiers (Kononenko 1991; Denton and Perrizo 2004), tree-augmented naive Bayesian classifiers (Friedman et al. 1997), k-dependence Bayesian classifiers (Sahami 1996) and Bayesian Network-augmented naive Bayesian classifiers (Cheng and Greiner 1999).

A second family of NBC is suitable for continuous attribute values. They directly estimate the true density of attributes using parametric density. A supplementary common assumption made by the NBC in that case is that within each class the values of numeric attributes are normally distributed around the mean, and they model each attribute through a single Gaussian. Then, the NBC represent such a distribution in terms of its mean and standard deviation and compute the probability of an observed value from such estimates. This probability is calculated as follows:

$$ p(x_{i}|c_{j})=g(x_{i}, \mu_{j}, \sigma_{j})=\frac{1}{\sqrt{2\Uppi}\sigma_{j}}\hbox{e}^{-\frac{(x_{i}- \mu_{j})^{2}}{2\sigma_{j}^2}} $$

(24)

The Gaussian classifiers (Geiger and Heckerman 1994; John and Langley 1995) are known for their simplicity and have a smaller complexity, compared with other non-parametric approximations. Although the normality assumption may be a valuable approximation for many benchmarks, it is not always the best estimation. Moreover, if the normality assumption is violated, classification results of NBC may deteriorate.

Other approaches using a non-parametric estimation are those breaking with the strong parametric assumption. The main approaches are based on the mixture model (Figueiredo and Leit ao 1999; McLachlan and Peel 2000) and the Gaussian mixture models (Bishop 1999; McLachlan and Peel 2000). Other approaches use kernel densities (John and Langley 1995; Pérez et al. 2009), leading to so-called Flexible Classifiers. This name is due to the ability of such classifier to represent densities with more than one mode in contrast with simple Gaussian classifiers. Flexible classifiers represent densities of different shapes with high accuracy; however, it results into a considerable increase in complexity.

John and Langley (1995) have proposed a Flexible Naive Bayesian Classifier (FNBC) that abandons the normality assumption and instead uses nonparametric kernel density estimation for each conditional distribution. The FNBC has the same properties as those introduced for the NBC; the only difference is instead of estimating the density for each continuous attribute x by a single Gaussian g(x, μ_j, σ_j), this density is estimated using an averaged large set of Gaussian kernels. To compute continuous attribute density for a specific class j, FNBC calculates n Gaussians, where each of them stores each attribute value encountered during training for this class and then takes the average of the n Gaussians to estimate p(x _i|c _j). More formally, probability distribution is estimated as follows:

$$ p(x_{i}|c_{j})=\frac{1}{N_{j}}\sum _{k=1}^{N_{j}} g(x_{i}, \mu_{ik}, \sigma_{j}) $$

(25)

where k ranges over the training set of attribute x _i in class c _j, N _j is the number of instances belonging to the class c _j. The mean μ_ik is equal to the real value of attribute i of the instance k belonging to the class j, e.g. μ_ik = x _ik. For each class j, FNBC estimates this standard deviation by

$$ \sigma_j=\frac{1}{\sqrt{N_j}} $$

(26)

The authors also prove kernel estimation consistency using (26) (see John and Langley 1995, for details). It has been shown that the kernel density estimation used in the FNBC and applied on several datasets enables this classifier to perform well in datasets where the parametric assumption is violated with little cost for datasets where it holds.

Pérez et al. (2009) have recently proposed a new approach for Flexible Bayesian classifiers based on kernel density estimation that extends the FNBC proposed by John and Langley (1995) to handle dependent attributes and abandons the independence assumption. In this work, three classifiers: tree-augmented naive Bays, a k-dependence Bayesian classifier and a complete graph are adapted to the support kernel Bayesian network paradigm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bounhas, M., Mellouli, K., Prade, H. et al. Possibilistic classifiers for numerical data. Soft Comput 17, 733–751 (2013). https://doi.org/10.1007/s00500-012-0947-9

Download citation

Published: 06 November 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s00500-012-0947-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Possibilistic classifiers for numerical data

Abstract

Access this article

Similar content being viewed by others

A Modified Naïve Possibilistic Classifier for Numerical Data

Decision Quality Enhancement in Minimum-Based Possibilistic Classification for Numerical Data

On High Dimensional Searching Spaces and Learning Methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Naive Bayesian Classifiers

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Possibilistic classifiers for numerical data

Abstract

Access this article

Similar content being viewed by others

A Modified Naïve Possibilistic Classifier for Numerical Data

Decision Quality Enhancement in Minimum-Based Possibilistic Classification for Numerical Data

On High Dimensional Searching Spaces and Learning Methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Naive Bayesian Classifiers

Appendix: Naive Bayesian Classifiers

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation