Abstract
Associative classification has been shown to provide interesting results whenever of use to classify data. With the increasing complexity of new databases, retrieving valuable information and classifying incoming data is becoming a thriving and compelling issue. The evidential database is a new type of database that represents imprecision and uncertainty. In this respect, extracting pertinent information such as frequent patterns and association rules is of paramount importance task. In this work, we tackle the problem of pertinent information extraction from an evidential database. A new data mining approach, denoted EDMA, is introduced that extracts frequent patterns overcoming the limits of pioneering works of the literature. A new classifier based on evidential association rules is thus introduced. The obtained association rules, as well as their respective confidence values, are studied and weighted with respect to their relevance. The proposed methods are thoroughly experimented on several synthetic evidential databases and showed performance improvement.
Similar content being viewed by others
Notes
An association rule is considered as valid if its confidence is greater than or equal to a threshold minconf.
A BBA with only one focal element H and H∈Θ is said to be certain and is denoted m(H) = 1.
A BBA is said consonant if focal elements are nested.
References
Aggarwal, C.C. (2009). Managing and mining uncertain data Vol. 35. Berlin Heidelberg New York: Springer.
Aggarwal, C.C., Li, Y., Wang, J., & Wang, J. (2009). Frequent pattern mining with uncertain data. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France (pp. 29–38).
Agrawal, R., & Srikant, R. (1994). Fast algorithm for mining association rules. In Proceedings of international conference on very large databases, VLDB, Santiago de Chile, Chile (pp. 487–499).
Bach Tobji, M.A., Ben Yaghlane, B., & Mellouli, K. (2009). Incremental maintenance of frequent itemsets in evidential databases. In Proceedings of the 10th European conference on symbolic and quantitative approaches to reasoning with uncertainty, Verona, Italy (pp. 457–468).
Bell, D.A., Guan, J., & Lee, S.K. (1996). Generalized union and project operations for pooling uncertain and imprecise information. Data & Knowledge Engineering, 18(2), 89–117.
Ben Yahia, S., Hamrouni, T., & Mephu Nguifo, E. (2006). Frequent closed itemset based algorithms: a thorough structural and analytical survey. SIGKDD Explorations, 8(1), 93–104.
Chui, C.K., Kao, B., & Hung, E. (2007). Mining frequent itemsets from uncertain data. In Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining, Nanjing, China (pp. 47–58).
Dempster, A. (1967). Upper and lower probabilities induced by multivalued mapping. AMS-38.
Dubois, D., & Prade, H. (1988). Possibility theory: an approach to computerized processing of uncertainty. New York: Plenum Press.
Fagin, R., & Halpern, J.Y. (1990). A new approach to updating beliefs. In Proceedings of the 6th annual conference on uncertainty in artificial intelligence, UAI’90 (pp. 347–374). Amsterdam: Elsevier.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Gärdenfors, P. (1983). Probabilistic reasoning and evidentiary value. In Evidentiary value: philosophical, judicial, and psychological aspects of a theory: essays dedicated to Sören Halldén on his 60th Birthday. C.W.K. Gleerups.
Hewawasam, K.K.R., Premaratne, K., & Shyu, M.L. (2007). Rule mining and classification in a situation assessment application: a belief-theoretic approach for handling data imperfections. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 37 (6), 1446–1459.
Hewawasam, K.K.R., Premaratne, K., Shyu, M.L., & Subasingha, S.P. (2005). Rule mining and classification in the presence of feature level and class label ambiguities. In SPIE 5803, intelligent computing: theory and applications III, Vol. 98.
Hong, T.P., Kuo, C.S., & Chi, S.C. (1999). Mining association rules from quantitative data. Intelligent Data Analysis, 3(5), 363–376.
Hong, T.P., Kuo, C.S., & Wang, S.L. (2004). A fuzzy AprioriTid mining algorithm with reduced computational time. Applied Soft Computing, 5(1), 1–10.
Jousselme, A.L., & Maupin, P. (2012). Distance in evidence theory: comprehensive survey and generalizations. International Journal of Approximate Reasoning, 53(2), 118–145.
Lee, S.K. (1992). An extended relational database model for uncertain and imprecise information. In Proceedings of the 18th international conference on very large data bases, VLDB92, Vancouver, British Columbia, Canada (pp. 211–220).
Lee, S.K. (1992). Imprecise and uncertain information in databases: an evidential approach. In Proceedings of 8th international conference on data engineering, Tempe, AZ (pp. 614–621).
Leung, C.K.S., Mateo, M.A.F., & Brajczuk, D.A. (2008). A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of 12th Pacific-Asia conference on knowledge discovery and data mining, Osaka, Japan (vol. 5012 pp. 653–661).
Li, W., Han, J., & Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In Proceedings of IEEE international conference on data mining (ICDM01), San Jose, CA (pp. 369–376). IEEE Computer Society.
Manjusha, R., & Ramachandran, R. (2011). Web mining framework for security in e-commerce. In Proceedings of international conference on recent trends in information technology (ICRTIT), Chennai, India (pp. 1043–1048).
Masson, M.H., & Denœux, T. (2008). ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recognition, 41(4), 1384–1397.
Ordonez, C., Ezquerra, N., & Santana, C.A. (2006). Constraining and summarizing association rules in medical data. Knowledge and Information Systems, 9(3), 259–283.
Ordonez, C., & Omiecinski, E. (1999). Discovering association rules based on image content. In Proceedings of the IEEE advances in digital libraries conference (ADL’99), Baltimore, MD (pp. 38–49).
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24, 25–46.
Samet, A., Lefevre, E., & Ben Yahia, S. (2013). Mining frequent itemsets in evidential database. In Proceedings of the 5th international conference on knowledge and systems engeneering, Hanoi, Vietnam (pp. 377–388).
Samet, A., Lefèvre, E., & Ben Yahia, S. (2014). Classification with evidential associative rules. In Proceedings of 15th international conference on information processing and management of uncertainty in knowledge-based systems, Montpellier, France (pp. 25–35).
Samet, A., Lefevre, E., & Ben Yahia, S. (2014). Evidential database: a new generalization of databases? In Proceedings of 3rd international conference on belief functions, belief 2014, Oxford, UK (pp. 105–114).
Smets, P. (1988). Belief functions. In P. Smets, A. Mamdani, D. Dubois, & H. Prade (Eds.), Non standard logics for automated reasoning (pp. 253–286). London: Academic.
Smets, P. (1990). The transferable belief model and other interpretations of Dempster-Shafer’s model. In Proceedings of the 6th annual conference on uncertainty in artificial intelligence, UAI’90 (pp. 375–383). Cambridge: MIT.
Smets, P., & Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2), 191–234.
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L. (2002). Computing iceberg concept lattices with titanic. Data & Knowledge Engineering, 42, 189–222.
Tong, Y., Chen, L., Cheng, Y., & Yu, P.S. (2012). Mining frequent itemsets over uncertain databases. In Proceedings of the 38th International Conference on Very Large Databases, VLDB12, Istanbul, Turkey, 5(11), 1650–1661.
Wu, X., Zhang, C., & Zhang, S. (2005). Database classification for multi-database mining. Information Systems, 30, 71–88.
Yin, J., Zhou, X., & Yang, M. (2006). Data mining in incomplete database. Computer Engineering, 12, 013.
Acknowledgments
The authors would like to express their sincere gratitude to the anonymous reviewers for their constructive and helpful comments and suggestions which have been in help to improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Evidential database creation through evidential C-means
Appendix: Evidential database creation through evidential C-means
From a set of numerical data such as those in Table 11, it is possible to construct an evidential database with ECM. For example, the database, presented in Table 11, is composed of 30 instances and 2 features. This dataset is composed of 2 classes {C 1, C 2}. Figure 3 illustrates the representation of these data in the feature space. From this database, the case of instance #28 will be studied (in bold in Table 11). In Fig. 3, this point is represented by a pentagram.
ECM starts by creating the user requested number of cluster for each feature. In this example, we choice respectively 3 and 2 clusters for Feature n ∘1 and Feature n ∘2.
According to one feature, ECM estimates the distance between each instance and each cluster’ center. A BBA is created depending on the computed distance. Afterwards, ECM tries to minimize the objective function defined in (36). ECM computes recursively the cluster’s center until the objective function is no more minimization is possible. From evidential data mining point of view, ECM allows us to construct for each instance, according to each feature, a BBA that represents its membership to each cluster. The clusters are different categories that we may extract for a dataset feature (column). In the proposed example, results of clustered are illustrated in Fig. 4. In this figure, the studied instance is also represented by a pentagram. Thus for this instance, a BBA m 1 is obtained, with ECM, on frame of discernment Θ A = {A 1, A 2, A 3} according to Feature n ∘1. A second BBA, m 2, is computed on frame of discernment Θ B = {B 1, B 2} according to Feature n ∘2. These BBAs correspond to mass functions of the evidential database for each attribute (column). Table 12 shows BBAs obtained for instance #28 according to these 2 features.
Rights and permissions
About this article
Cite this article
Samet, A., Lefèvre, E. & Ben Yahia, S. Evidential data mining: precise support and confidence. J Intell Inf Syst 47, 135–163 (2016). https://doi.org/10.1007/s10844-016-0396-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-016-0396-5