Skip to main content
Log in

Evidential data mining: precise support and confidence

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Associative classification has been shown to provide interesting results whenever of use to classify data. With the increasing complexity of new databases, retrieving valuable information and classifying incoming data is becoming a thriving and compelling issue. The evidential database is a new type of database that represents imprecision and uncertainty. In this respect, extracting pertinent information such as frequent patterns and association rules is of paramount importance task. In this work, we tackle the problem of pertinent information extraction from an evidential database. A new data mining approach, denoted EDMA, is introduced that extracts frequent patterns overcoming the limits of pioneering works of the literature. A new classifier based on evidential association rules is thus introduced. The obtained association rules, as well as their respective confidence values, are studied and weighted with respect to their relevance. The proposed methods are thoroughly experimented on several synthetic evidential databases and showed performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. An association rule is considered as valid if its confidence is greater than or equal to a threshold minconf.

  2. A BBA with only one focal element H and H∈Θ is said to be certain and is denoted m(H) = 1.

  3. A BBA is said consonant if focal elements are nested.

References

  • Aggarwal, C.C. (2009). Managing and mining uncertain data Vol. 35. Berlin Heidelberg New York: Springer.

    Book  MATH  Google Scholar 

  • Aggarwal, C.C., Li, Y., Wang, J., & Wang, J. (2009). Frequent pattern mining with uncertain data. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France (pp. 29–38).

  • Agrawal, R., & Srikant, R. (1994). Fast algorithm for mining association rules. In Proceedings of international conference on very large databases, VLDB, Santiago de Chile, Chile (pp. 487–499).

  • Bach Tobji, M.A., Ben Yaghlane, B., & Mellouli, K. (2009). Incremental maintenance of frequent itemsets in evidential databases. In Proceedings of the 10th European conference on symbolic and quantitative approaches to reasoning with uncertainty, Verona, Italy (pp. 457–468).

  • Bell, D.A., Guan, J., & Lee, S.K. (1996). Generalized union and project operations for pooling uncertain and imprecise information. Data & Knowledge Engineering, 18(2), 89–117.

    Article  MATH  Google Scholar 

  • Ben Yahia, S., Hamrouni, T., & Mephu Nguifo, E. (2006). Frequent closed itemset based algorithms: a thorough structural and analytical survey. SIGKDD Explorations, 8(1), 93–104.

    Article  Google Scholar 

  • Chui, C.K., Kao, B., & Hung, E. (2007). Mining frequent itemsets from uncertain data. In Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining, Nanjing, China (pp. 47–58).

  • Dempster, A. (1967). Upper and lower probabilities induced by multivalued mapping. AMS-38.

  • Dubois, D., & Prade, H. (1988). Possibility theory: an approach to computerized processing of uncertainty. New York: Plenum Press.

    Book  MATH  Google Scholar 

  • Fagin, R., & Halpern, J.Y. (1990). A new approach to updating beliefs. In Proceedings of the 6th annual conference on uncertainty in artificial intelligence, UAI’90 (pp. 347–374). Amsterdam: Elsevier.

    Google Scholar 

  • Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  • Gärdenfors, P. (1983). Probabilistic reasoning and evidentiary value. In Evidentiary value: philosophical, judicial, and psychological aspects of a theory: essays dedicated to Sören Halldén on his 60th Birthday. C.W.K. Gleerups.

  • Hewawasam, K.K.R., Premaratne, K., & Shyu, M.L. (2007). Rule mining and classification in a situation assessment application: a belief-theoretic approach for handling data imperfections. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 37 (6), 1446–1459.

    Article  Google Scholar 

  • Hewawasam, K.K.R., Premaratne, K., Shyu, M.L., & Subasingha, S.P. (2005). Rule mining and classification in the presence of feature level and class label ambiguities. In SPIE 5803, intelligent computing: theory and applications III, Vol. 98.

  • Hong, T.P., Kuo, C.S., & Chi, S.C. (1999). Mining association rules from quantitative data. Intelligent Data Analysis, 3(5), 363–376.

    Article  MATH  Google Scholar 

  • Hong, T.P., Kuo, C.S., & Wang, S.L. (2004). A fuzzy AprioriTid mining algorithm with reduced computational time. Applied Soft Computing, 5(1), 1–10.

    Article  Google Scholar 

  • Jousselme, A.L., & Maupin, P. (2012). Distance in evidence theory: comprehensive survey and generalizations. International Journal of Approximate Reasoning, 53(2), 118–145.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, S.K. (1992). An extended relational database model for uncertain and imprecise information. In Proceedings of the 18th international conference on very large data bases, VLDB92, Vancouver, British Columbia, Canada (pp. 211–220).

  • Lee, S.K. (1992). Imprecise and uncertain information in databases: an evidential approach. In Proceedings of 8th international conference on data engineering, Tempe, AZ (pp. 614–621).

  • Leung, C.K.S., Mateo, M.A.F., & Brajczuk, D.A. (2008). A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of 12th Pacific-Asia conference on knowledge discovery and data mining, Osaka, Japan (vol. 5012 pp. 653–661).

  • Li, W., Han, J., & Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In Proceedings of IEEE international conference on data mining (ICDM01), San Jose, CA (pp. 369–376). IEEE Computer Society.

  • Manjusha, R., & Ramachandran, R. (2011). Web mining framework for security in e-commerce. In Proceedings of international conference on recent trends in information technology (ICRTIT), Chennai, India (pp. 1043–1048).

  • Masson, M.H., & Denœux, T. (2008). ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recognition, 41(4), 1384–1397.

    Article  MATH  Google Scholar 

  • Ordonez, C., Ezquerra, N., & Santana, C.A. (2006). Constraining and summarizing association rules in medical data. Knowledge and Information Systems, 9(3), 259–283.

    Article  Google Scholar 

  • Ordonez, C., & Omiecinski, E. (1999). Discovering association rules based on image content. In Proceedings of the IEEE advances in digital libraries conference (ADL’99), Baltimore, MD (pp. 38–49).

  • Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24, 25–46.

    Article  MATH  Google Scholar 

  • Samet, A., Lefevre, E., & Ben Yahia, S. (2013). Mining frequent itemsets in evidential database. In Proceedings of the 5th international conference on knowledge and systems engeneering, Hanoi, Vietnam (pp. 377–388).

  • Samet, A., Lefèvre, E., & Ben Yahia, S. (2014). Classification with evidential associative rules. In Proceedings of 15th international conference on information processing and management of uncertainty in knowledge-based systems, Montpellier, France (pp. 25–35).

  • Samet, A., Lefevre, E., & Ben Yahia, S. (2014). Evidential database: a new generalization of databases? In Proceedings of 3rd international conference on belief functions, belief 2014, Oxford, UK (pp. 105–114).

  • Smets, P. (1988). Belief functions. In P. Smets, A. Mamdani, D. Dubois, & H. Prade (Eds.), Non standard logics for automated reasoning (pp. 253–286). London: Academic.

    Google Scholar 

  • Smets, P. (1990). The transferable belief model and other interpretations of Dempster-Shafer’s model. In Proceedings of the 6th annual conference on uncertainty in artificial intelligence, UAI’90 (pp. 375–383). Cambridge: MIT.

    Google Scholar 

  • Smets, P., & Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2), 191–234.

    Article  MathSciNet  MATH  Google Scholar 

  • Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L. (2002). Computing iceberg concept lattices with titanic. Data & Knowledge Engineering, 42, 189–222.

    Article  MATH  Google Scholar 

  • Tong, Y., Chen, L., Cheng, Y., & Yu, P.S. (2012). Mining frequent itemsets over uncertain databases. In Proceedings of the 38th International Conference on Very Large Databases, VLDB12, Istanbul, Turkey, 5(11), 1650–1661.

    Google Scholar 

  • Wu, X., Zhang, C., & Zhang, S. (2005). Database classification for multi-database mining. Information Systems, 30, 71–88.

    Article  MATH  Google Scholar 

  • Yin, J., Zhou, X., & Yang, M. (2006). Data mining in incomplete database. Computer Engineering, 12, 013.

    Google Scholar 

Download references

Acknowledgments

The authors would like to express their sincere gratitude to the anonymous reviewers for their constructive and helpful comments and suggestions which have been in help to improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Samet.

Appendix: Evidential database creation through evidential C-means

Appendix: Evidential database creation through evidential C-means

From a set of numerical data such as those in Table 11, it is possible to construct an evidential database with ECM. For example, the database, presented in Table 11, is composed of 30 instances and 2 features. This dataset is composed of 2 classes {C 1, C 2}. Figure 3 illustrates the representation of these data in the feature space. From this database, the case of instance #28 will be studied (in bold in Table 11). In Fig. 3, this point is represented by a pentagram.

Table 11 Numerical dataset
Fig. 3
figure 3

Representation of data proposed in Table 11

ECM starts by creating the user requested number of cluster for each feature. In this example, we choice respectively 3 and 2 clusters for Feature n 1 and Feature n 2.

According to one feature, ECM estimates the distance between each instance and each cluster’ center. A BBA is created depending on the computed distance. Afterwards, ECM tries to minimize the objective function defined in (36). ECM computes recursively the cluster’s center until the objective function is no more minimization is possible. From evidential data mining point of view, ECM allows us to construct for each instance, according to each feature, a BBA that represents its membership to each cluster. The clusters are different categories that we may extract for a dataset feature (column). In the proposed example, results of clustered are illustrated in Fig. 4. In this figure, the studied instance is also represented by a pentagram. Thus for this instance, a BBA m 1 is obtained, with ECM, on frame of discernment Θ A = {A 1, A 2, A 3} according to Feature n 1. A second BBA, m 2, is computed on frame of discernment Θ B = {B 1, B 2} according to Feature n 2. These BBAs correspond to mass functions of the evidential database for each attribute (column). Table 12 shows BBAs obtained for instance #28 according to these 2 features.

Fig. 4
figure 4

ECM clustering from the given dataset of Table 11

Table 12 BBAs obtained with ECM for instance #28

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samet, A., Lefèvre, E. & Ben Yahia, S. Evidential data mining: precise support and confidence. J Intell Inf Syst 47, 135–163 (2016). https://doi.org/10.1007/s10844-016-0396-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-016-0396-5

Keywords

Navigation