Abstract
Imperfect information inevitably appears in real situations for a variety of reasons. Although efforts have been made to incorporate imperfect data into classification techniques, there are still many limitations as to the type of data, uncertainty, and imprecision that can be handled. In this paper, we will present a Fuzzy Random Forest ensemble for classification and show its ability to handle imperfect data into the learning and the classification phases. Then, we will describe the types of imperfect data it supports. We will devise an augmented ensemble that can operate with others type of imperfect data: crisp, missing, probabilistic uncertainty, and imprecise (fuzzy and crisp) values. Additionally, we will perform experiments with imperfect datasets created for this purpose and datasets used in other papers to show the advantage of being able to express the true nature of imperfect information.
Similar content being viewed by others
References
Ahn H, Moon H, Fazzari J, Lim N, Chen J, Kodell R (2007) Classification by ensembles from random partitions of high dimensional data. Comput Stat Data Anal 51:6166–6179
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository.University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/mlearn/MLRepository.html
Bonissone PP (1997) Approximate reasoning systems: handling uncertainty and imprecision in information systems. In: Motro, A, Smets, Ph (eds) Uncertainty management in information systems: from needs to solutions. Kluwer Academic Publishers, Dordrecht, pp 369–395
Bonissone PP, Cadenas JM, Garrido MC, Díaz-Valladares RA (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747
Casillas J, Sánchez L (2006) Knowledge extraction from data fuzzy for estimating consumer behavior models. In: Proceedings of IEEE conference on Fuzzy Systems, Vancouver, BC, Canada, pp 164–170
Coppi R, Gil MA, Kiers HAL (2006) The fuzzy approach to statistical analysis. Comput Stat Data Anal 51:1–14
Dubois D, Prade H (1988) Possibility theory: an approach to computerized processing of uncertainty. Plenum Press, New York
Dubois D, Guyonnet D (2011) Risk-informed decision-making in the presence of epistemic uncertainty. Int J Gen Syst 40(2):145–167
Duda RO, Hart PE, Stork DG (2001) Pattern classification. John Wiley and Sons, Inc, New York
Fernández A, del Jesus MJ, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50 (3):561–577
García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Garrido MC, Cadenas JM, Bonissone PP (2010) A classification and regression technique to handle heterogeneous and imperfect information. Soft Comput 14(11):1165–1185
Hernández J, Ramírez MJ, Ferri C (2004) Introducción a la Minería de Datos. Pearson-Prentice Hall, Englewood Cliffs
Janikow CZ (1996) Exemplar learning in fuzzy decision trees. In: Proceedings of the FUZZ-IEEE, New Orleans, USA, pp 1500–1505
Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Man Syst Cybern 28:1–14
Langseth H, Nielsen TD, Rum R, Salmern A (2009) Inference in hybrid Bayesian networks. Reliab Eng Syst Saf 94:1499–1509
Mackay DJC (2003) Information theory, inference and learning algorithms. Cambridge University Press, Cambridge
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley Series in Probability and Statistics, New York
Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell 11:169–198
Otero A, Otero J, Sánchez L, Villar JR (2006) Longest path estimation from inherently fuzzy data acquired with GPS using genetic algorithms. In: Proceedings on International Symposium on Evolving Fuzzy Systems, Lancaster, UK, pp 300–305
Palacios AM, Sánchez L, Couso I (2009) Extending a simple genetic cooperative–competitive learning fuzzy classifier to low quality datasets. Evol Intell 2:73–84
Palacios AM, Sánchez L, Couso I (2010) Diagnosis of dyslexia with low quality data with genetic fuzzy systems. Int J Approx Reason 51:993–1009
Palacios AM, Sánchez L, Couso I (2011) Linguistic cost-sensitive learning of genetic fuzzy classifiers for imprecise data. Int J Approx Reason 52:841–862
Quaeghebeur E, Cooman G (2005) Imprecise probability models for inference in exponential families. In: 4th international symposium on imprecise probabilities and their applications, Pittsburgh, Pennsylvania, pp 287–296
Quinlan JR (1993) C4.5: programs for machine learning. The Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann Publishers, San Mateo
Ruiz A, López de Teruel PE, Garrido MC (1998) Probabilistic inference from arbitrary uncertainty using mixtures of factorized generalized Gaussians. J Artif Intell Res 9:167–217
Sánchez L, Suárez MR, Villar JR, Couso I (2008) Mutual information-based feature selection and partition design in fuzzy rule-based classifiers from vague data. Int J Approx Reason 49(3):607–622
Sánchez L, Couso I, Casillas J (2009) Genetic learning of fuzzy rules based on low quality data. Fuzzy Sets Syst 160:2524–2552
Witten IH, Frank E (2000) Data mining. Morgan Kaufmann Publishers, San Francisco
Acknowledgments
Supported by the project TIN2008-06872-C04-03 of the MICINN of Spain and European Fund for Regional Development. Thanks also to the Funding Program for Research Groups of Excellence with code 04552/GERM/06 granted by the “Fundación Séneca”, Murcia, Spain. R. Martínez is supported by the scholarship program FPI from the “Fundación Séneca” of Spain. Thanks to Luciano Sánchez and Ana Palacios for their help in creating extended boxplot.
Author information
Authors and Affiliations
Corresponding author
Appendix: Combination methods
Appendix: Combination methods
In this appendix, we use the notation above defined in Sect. 3.2. For each strategy for fuzzy classifier module in the FRF ensemble that is described in Sect. 3.2 we can define several functions Faggre11, Faggre12 and Faggre2. The overall set of functions is described in Bonissone et al. (2010). In this appendix, the utilized methods in this paper are described in detail:
1.1 Non-trainable methods
In these combination methods, a transformation is applied to the matrix L_FRF in Step 2 in Algorithms 3 and 4 so that each leaf reached assigns a simple vote to the majority class.
Within this group we define the following methods. We get two versions depending on the strategy used:
-
Simple Majority vote:
-
Strategy 1→ method SM1
The function Faggre11 in Algorithm 3 is defined as
$$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\rm otherwise} \\ \end{array}\right. $$In this method, each tree t assigns a simple vote to the most voted class among the N t reached leaves by example e in the tree.
The function Faggre12 in Algorithm 3 is defined as
$$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} T\_FRF_{t,i} $$ -
Strategy 2 → method SM2
For Strategy 2, it is necessary to define the function Faggre2 combining information from all leaves reached in the ensemble by example e. Thus, the function Faggre2 in Algorithm 4 is defined as
$$ Faggre2(i,L\_FRF)= \sum_{t=1}^{T} \sum_{n=1}^{N_t} L\_FRF_mod{t,n,i} $$
1.2 Trainable explicitly dependent methods
-
Majority vote Weighted by Leaf:
In this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3 and 4 so that each leaf reached assigns a weighted vote to the majority class. The vote is weighted by the degree of satisfaction with which example e reaches the leaf.
$$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \left\{ \begin{array}{ll} \chi_{t,n(e)} & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \{L\_FRF_{t,n,j}\}\\ 0 & {\rm otherwise} \\ \end{array}\right. \end{aligned} $$Again, we have two versions according to the strategy used.
-
Strategy 1 → method MWL1
The functions Faggre11 and Faggre12 are defined as
$$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\rm otherwise} \\ \end{array}\right. $$$$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} T\_FRF_{t,i} $$ -
Strategy 2 → method MWL2
The function Faggre2 is defined as
$$ Faggre2(i,L\_FRF)= \sum_{t=1}^{T} \sum_{n=1}^{N_t} L\_FRF_{t,n,i} $$-
Majority vote Weighted by Leaf and by Tree:
Again, in this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3 and 4 so that each leaf reached assigns a weighted vote to the majority class. The vote is weighted by the degree of satisfaction with which example e reaches the leaf.
$$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \left\{ \begin{array}{ll} \chi_{t,n}(e) & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \{L\_FRF_{t,n,j}\}\\ 0 & {\rm otherwise} \\ \end{array}\right. \end{aligned} $$In addition, in this method a weight for each tree obtained is introduced by testing each individual tree with the OOB dataset. Let \(\overline{p}=(p_1,p_2,\ldots,p_T)\) be the vector with the weights assigned to each tree. Each p t is obtained as \(\frac{N\_success\_OOB_t}{size\_OOB_t}\) where N_success_OOB t is the number of examples classified correctly from the OOB dataset used for testing the tth tree and size_OOB t is the total number of examples in this dataset.
Strategy 1 → method MWLT1
The function Faggre11 is defined as:
$$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\hbox{if}}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\hbox{otherwise}} \\ \end{array}\right. $$Vector \(\overline{p}\) is used in the definition of function Faggre12:
$$ Faggr1_2(i,T\_FRF)=\sum_{t=1}^{T} p_t \cdot T\_FRF_{t,i} $$Strategy 2 → method MWLT2
The vector of weights \(\overline{p}\) is applied to Strategy 2.
$$ Faggre2(i,L\_FRF)=\sum_{t=1}^{T} p_t \sum_{n=1}^{N_t} L\_FRF_{t,n,i} $$-
Minimum Weighted by Leaf and by membership Function:
In this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3.
$$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \chi_{t,n}(e) \times \frac{E_i}{E_n} \end{aligned} $$Strategy 1 → method MIWLF1
The function Faggre11 is defined as:
$$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\hbox{if}} \ i=\displaystyle\arg\max_{j, j=1,\ldots,I}\left\{ min(L\_FRF-mod_{t,1,j},L\_FRF-mod_{t,2,j},\right.\\ & \phantom{si\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}}\left.\ldots ,L\_FRF-mod_{t,N_t,j})\right\} \\ 0 & {\hbox{otherwise}} \\ \end{array}\right. $$The function Faggre12 incorporates the weighting, defined by a fuzzy membership function, for each tree:
$$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} \mu_{pond}\left(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right) \cdot T\_FRF_{t,i} $$The membership function is defined by μ pond (x):
$$ \mu_{pond}(x)=\left\{\begin{array}{ll} 1 & 0 \leq x \leq (pmin+marg)\\ \frac{(pmax+marg)-x}{(pmax-pmin)} & (pmin+marg)\leq x \leq (pmax+marg)\\ 0 & (pmax+marg) \leq x \\ \end{array}\right. $$where
-
pmax is the maximum rate of errors in the trees of the FRF ensemble (\(pmax=\max_{t=1,\ldots,T}\left\{\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right\}\)). The rate of errors in a tree t is obtained as \(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\) where errors (OOB_t) is the number of classification errors of the tree t (using the OOB t dataset as test set), and size (OOB_t) is the cardinal of the OOB t dataset. As we indicated above, the OOB t examples are not used to build the tree t and they constitute an independent sample to test tree t. So we can measure the goodness of a tree t as the number of errors when classifying the set of examples OOB t ;
-
pmin is the minimum rate of errors in the trees of the FRF ensemble; and
-
\(marg=\frac{pmax-pmin}{4}\)
-
-
1.3 Trainable implicitly dependent methods
Within this group we define the following methods:
-
Minimum Weighted by membership Function: In this combination method, no transformation is applied to matrix L_FRF in Step 2 of Algorithm 3.
Strategy 1 → method MIWF1
The function Faggre11 is defined as
$$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}\left\{ min(L\_FRF_{t,1,j},L\_FRF_{t,2,j}, \right. \\ & \phantom{si\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}}\left.\ldots ,L\_FRF_{t,N_t,j})\right\} \\ 0 & {\rm otherwise} \\ \end{array}\right. $$The function Faggre12 incorporates the weighting defined by the previous fuzzy membership function for each tree:
$$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} \mu_{pond}\left(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right) \cdot T\_FRF_{t,i} $$
Rights and permissions
About this article
Cite this article
Cadenas, J.M., Garrido, M.C., Martínez, R. et al. Extending information processing in a Fuzzy Random Forest ensemble. Soft Comput 16, 845–861 (2012). https://doi.org/10.1007/s00500-011-0777-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-011-0777-1