Skip to main content
Log in

Extending information processing in a Fuzzy Random Forest ensemble

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Imperfect information inevitably appears in real situations for a variety of reasons. Although efforts have been made to incorporate imperfect data into classification techniques, there are still many limitations as to the type of data, uncertainty, and imprecision that can be handled. In this paper, we will present a Fuzzy Random Forest ensemble for classification and show its ability to handle imperfect data into the learning and the classification phases. Then, we will describe the types of imperfect data it supports. We will devise an augmented ensemble that can operate with others type of imperfect data: crisp, missing, probabilistic uncertainty, and imprecise (fuzzy and crisp) values. Additionally, we will perform experiments with imperfect datasets created for this purpose and datasets used in other papers to show the advantage of being able to express the true nature of imperfect information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ahn H, Moon H, Fazzari J, Lim N, Chen J, Kodell R (2007) Classification by ensembles from random partitions of high dimensional data. Comput Stat Data Anal 51:6166–6179

    Article  MathSciNet  MATH  Google Scholar 

  • Asuncion A, Newman DJ (2007) UCI Machine Learning Repository.University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/mlearn/MLRepository.html

  • Bonissone PP (1997) Approximate reasoning systems: handling uncertainty and imprecision in information systems. In: Motro, A, Smets, Ph (eds) Uncertainty management in information systems: from needs to solutions. Kluwer Academic Publishers, Dordrecht, pp 369–395

  • Bonissone PP, Cadenas JM, Garrido MC, Díaz-Valladares RA (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747

    Article  Google Scholar 

  • Casillas J, Sánchez L (2006) Knowledge extraction from data fuzzy for estimating consumer behavior models. In: Proceedings of IEEE conference on Fuzzy Systems, Vancouver, BC, Canada, pp 164–170

  • Coppi R, Gil MA, Kiers HAL (2006) The fuzzy approach to statistical analysis. Comput Stat Data Anal 51:1–14

    Article  MathSciNet  MATH  Google Scholar 

  • Dubois D, Prade H (1988) Possibility theory: an approach to computerized processing of uncertainty. Plenum Press, New York

  • Dubois D, Guyonnet D (2011) Risk-informed decision-making in the presence of epistemic uncertainty. Int J Gen Syst 40(2):145–167

    Article  MathSciNet  MATH  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification. John Wiley and Sons, Inc, New York

  • Fernández A, del Jesus MJ, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50 (3):561–577

    Article  MATH  Google Scholar 

  • García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977

    Article  Google Scholar 

  • Garrido MC, Cadenas JM, Bonissone PP (2010) A classification and regression technique to handle heterogeneous and imperfect information. Soft Comput 14(11):1165–1185

    Article  Google Scholar 

  • Hernández J, Ramírez MJ, Ferri C (2004) Introducción a la Minería de Datos. Pearson-Prentice Hall, Englewood Cliffs

  • Janikow CZ (1996) Exemplar learning in fuzzy decision trees. In: Proceedings of the FUZZ-IEEE, New Orleans, USA, pp 1500–1505

  • Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Man Syst Cybern 28:1–14

    Article  Google Scholar 

  • Langseth H, Nielsen TD, Rum R, Salmern A (2009) Inference in hybrid Bayesian networks. Reliab Eng Syst Saf 94:1499–1509

    Article  Google Scholar 

  • Mackay DJC (2003) Information theory, inference and learning algorithms. Cambridge University Press, Cambridge

  • McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley Series in Probability and Statistics, New York

  • Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63

    Article  Google Scholar 

  • Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell 11:169–198

    MATH  Google Scholar 

  • Otero A, Otero J, Sánchez L, Villar JR (2006) Longest path estimation from inherently fuzzy data acquired with GPS using genetic algorithms. In: Proceedings on International Symposium on Evolving Fuzzy Systems, Lancaster, UK, pp 300–305

  • Palacios AM, Sánchez L, Couso I (2009) Extending a simple genetic cooperative–competitive learning fuzzy classifier to low quality datasets. Evol Intell 2:73–84

    Article  Google Scholar 

  • Palacios AM, Sánchez L, Couso I (2010) Diagnosis of dyslexia with low quality data with genetic fuzzy systems. Int J Approx Reason 51:993–1009

    Article  Google Scholar 

  • Palacios AM, Sánchez L, Couso I (2011) Linguistic cost-sensitive learning of genetic fuzzy classifiers for imprecise data. Int J Approx Reason 52:841–862

    Article  Google Scholar 

  • Quaeghebeur E, Cooman G (2005) Imprecise probability models for inference in exponential families. In: 4th international symposium on imprecise probabilities and their applications, Pittsburgh, Pennsylvania, pp 287–296

  • Quinlan JR (1993) C4.5: programs for machine learning. The Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann Publishers, San Mateo

  • Ruiz A, López de Teruel PE, Garrido MC (1998) Probabilistic inference from arbitrary uncertainty using mixtures of factorized generalized Gaussians. J Artif Intell Res 9:167–217

    MATH  Google Scholar 

  • Sánchez L, Suárez MR, Villar JR, Couso I (2008) Mutual information-based feature selection and partition design in fuzzy rule-based classifiers from vague data. Int J Approx Reason 49(3):607–622

    Article  Google Scholar 

  • Sánchez L, Couso I, Casillas J (2009) Genetic learning of fuzzy rules based on low quality data. Fuzzy Sets Syst 160:2524–2552

    Article  MATH  Google Scholar 

  • Witten IH, Frank E (2000) Data mining. Morgan Kaufmann Publishers, San Francisco

Download references

Acknowledgments

Supported by the project TIN2008-06872-C04-03 of the MICINN of Spain and European Fund for Regional Development. Thanks also to the Funding Program for Research Groups of Excellence with code 04552/GERM/06 granted by the “Fundación Séneca”, Murcia, Spain. R. Martínez is supported by the scholarship program FPI from the “Fundación Séneca” of Spain. Thanks to Luciano Sánchez and Ana Palacios for their help in creating extended boxplot.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose M. Cadenas.

Appendix: Combination methods

Appendix: Combination methods

In this appendix, we use the notation above defined in Sect. 3.2. For each strategy for fuzzy classifier module in the FRF ensemble that is described in Sect. 3.2 we can define several functions Faggre11, Faggre12 and Faggre2. The overall set of functions is described in Bonissone et al. (2010). In this appendix, the utilized methods in this paper are described in detail:

1.1 Non-trainable methods

In these combination methods, a transformation is applied to the matrix L_FRF in Step 2 in Algorithms 3 and 4 so that each leaf reached assigns a simple vote to the majority class.

$$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \left\{ \begin{array}{ll} 1 & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \{L\_FRF_{t,n,j}\}\\ 0 &{\rm otherwise} \\ \end{array}\right. \end{aligned} $$

Within this group we define the following methods. We get two versions depending on the strategy used:

  • Simple Majority vote:

  • Strategy 1→ method SM1

    The function Faggre11 in Algorithm 3 is defined as

    $$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\rm otherwise} \\ \end{array}\right. $$

    In this method, each tree t assigns a simple vote to the most voted class among the N t reached leaves by example e in the tree.

    The function Faggre12 in Algorithm 3 is defined as

    $$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} T\_FRF_{t,i} $$
  • Strategy 2 → method SM2

    For Strategy 2, it is necessary to define the function Faggre2 combining information from all leaves reached in the ensemble by example e. Thus, the function Faggre2 in Algorithm 4 is defined as

    $$ Faggre2(i,L\_FRF)= \sum_{t=1}^{T} \sum_{n=1}^{N_t} L\_FRF_mod{t,n,i} $$

1.2 Trainable explicitly dependent methods

  • Majority vote Weighted by Leaf:

    In this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3 and 4 so that each leaf reached assigns a weighted vote to the majority class. The vote is weighted by the degree of satisfaction with which example e reaches the leaf.

    $$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \left\{ \begin{array}{ll} \chi_{t,n(e)} & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \{L\_FRF_{t,n,j}\}\\ 0 & {\rm otherwise} \\ \end{array}\right. \end{aligned} $$

    Again, we have two versions according to the strategy used.

  • Strategy 1 → method MWL1

    The functions Faggre11 and Faggre12 are defined as

    $$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\rm otherwise} \\ \end{array}\right. $$
    $$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} T\_FRF_{t,i} $$
  • Strategy 2 → method MWL2

    The function Faggre2 is defined as

    $$ Faggre2(i,L\_FRF)= \sum_{t=1}^{T} \sum_{n=1}^{N_t} L\_FRF_{t,n,i} $$
    • Majority vote Weighted by Leaf and by Tree:

    Again, in this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3 and 4 so that each leaf reached assigns a weighted vote to the majority class. The vote is weighted by the degree of satisfaction with which example e reaches the leaf.

    $$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \left\{ \begin{array}{ll} \chi_{t,n}(e) & {\rm if}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \{L\_FRF_{t,n,j}\}\\ 0 & {\rm otherwise} \\ \end{array}\right. \end{aligned} $$

    In addition, in this method a weight for each tree obtained is introduced by testing each individual tree with the OOB dataset. Let \(\overline{p}=(p_1,p_2,\ldots,p_T)\) be the vector with the weights assigned to each tree. Each p t is obtained as \(\frac{N\_success\_OOB_t}{size\_OOB_t}\) where N_success_OOB t is the number of examples classified correctly from the OOB dataset used for testing the tth tree and size_OOB t is the total number of examples in this dataset.

    Strategy 1 → method MWLT1

    The function Faggre11 is defined as:

    $$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\hbox{if}}\ i=\arg\displaystyle\max_{j, j=1,\ldots,I} \left\{ \sum_{n=1}^{N_t} L\_FRF-mod_{t,n,j} \right\}\\ 0 & {\hbox{otherwise}} \\ \end{array}\right. $$

    Vector \(\overline{p}\) is used in the definition of function Faggre12:

    $$ Faggr1_2(i,T\_FRF)=\sum_{t=1}^{T} p_t \cdot T\_FRF_{t,i} $$

    Strategy 2 → method MWLT2

    The vector of weights \(\overline{p}\) is applied to Strategy 2.

    $$ Faggre2(i,L\_FRF)=\sum_{t=1}^{T} p_t \sum_{n=1}^{N_t} L\_FRF_{t,n,i} $$
    • Minimum Weighted by Leaf and by membership Function:

      In this combination method, a transformation is applied to the matrix L_FRF in Step 2 of Algorithms 3.

      $$ \begin{aligned} {\rm For}\,t &= 1,\ldots,T \\ &{\rm For}\, n = 1,\ldots, N\\ & \quad {\rm For}\,i = 1,\ldots, I\\ &\qquad L\_FRF {-}mod_{t,n,i} = \chi_{t,n}(e) \times \frac{E_i}{E_n} \end{aligned} $$

      Strategy 1 → method MIWLF1

      The function Faggre11 is defined as:

      $$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\hbox{if}} \ i=\displaystyle\arg\max_{j, j=1,\ldots,I}\left\{ min(L\_FRF-mod_{t,1,j},L\_FRF-mod_{t,2,j},\right.\\ & \phantom{si\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}}\left.\ldots ,L\_FRF-mod_{t,N_t,j})\right\} \\ 0 & {\hbox{otherwise}} \\ \end{array}\right. $$

      The function Faggre12 incorporates the weighting, defined by a fuzzy membership function, for each tree:

      $$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} \mu_{pond}\left(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right) \cdot T\_FRF_{t,i} $$

      The membership function is defined by μ pond (x):

      $$ \mu_{pond}(x)=\left\{\begin{array}{ll} 1 & 0 \leq x \leq (pmin+marg)\\ \frac{(pmax+marg)-x}{(pmax-pmin)} & (pmin+marg)\leq x \leq (pmax+marg)\\ 0 & (pmax+marg) \leq x \\ \end{array}\right. $$

      where

      • pmax is the maximum rate of errors in the trees of the FRF ensemble (\(pmax=\max_{t=1,\ldots,T}\left\{\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right\}\)). The rate of errors in a tree t is obtained as \(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\) where errors (OOB_t) is the number of classification errors of the tree t (using the OOB t dataset as test set), and size (OOB_t) is the cardinal of the OOB t dataset. As we indicated above, the OOB t examples are not used to build the tree t and they constitute an independent sample to test tree t. So we can measure the goodness of a tree t as the number of errors when classifying the set of examples OOB t ;

      • pmin is the minimum rate of errors in the trees of the FRF ensemble; and

      • \(marg=\frac{pmax-pmin}{4}\)

1.3 Trainable implicitly dependent methods

Within this group we define the following methods:

  • Minimum Weighted by membership Function: In this combination method, no transformation is applied to matrix L_FRF in Step 2 of Algorithm 3.

    Strategy 1 → method MIWF1

    The function Faggre11 is defined as

    $$ Faggre1_1(t,i,L\_FRF)=\left\{ \begin{array}{ll} 1 & {\rm if}\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}\left\{ min(L\_FRF_{t,1,j},L\_FRF_{t,2,j}, \right. \\ & \phantom{si\ i=\displaystyle\arg\max_{j, j=1,\ldots,I}}\left.\ldots ,L\_FRF_{t,N_t,j})\right\} \\ 0 & {\rm otherwise} \\ \end{array}\right. $$

    The function Faggre12 incorporates the weighting defined by the previous fuzzy membership function for each tree:

    $$ Faggre1_2(i,T\_FRF)=\sum_{t=1}^{T} \mu_{pond}\left(\frac{errors_{(OOB_t)}}{size_{(OOB_t)}}\right) \cdot T\_FRF_{t,i} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cadenas, J.M., Garrido, M.C., Martínez, R. et al. Extending information processing in a Fuzzy Random Forest ensemble. Soft Comput 16, 845–861 (2012). https://doi.org/10.1007/s00500-011-0777-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-011-0777-1

Keywords

Navigation