Abstract
In the literature, univariate uncertain data has a quantitative interval for each attribute in each transaction, which is accompanied by a probability density function indicating the probability that each value in the interval exists and appears. To the best of our knowledge, classifying univariate uncertain data has thus far seldom been addressed in the literature. Here, we propose the AssoU2Classifier algorithm to address this research gap. The AssoU2Classifier algorithm retrieves association rules from the univariate uncertain data to serve as a classification model. In addition, the U2Pruning procedure is developed to prune the association rules. The U2Pruning procedure not only reduces the number of association rules, which considerably accelerates the classification process, but also achieves high classification accuracies. In the experiments, the AssoU2Classifier algorithm was compared with 14 existing algorithms on 12 modified UCI datasets. The AssoU2Classifier algorithm obtained better classification accuracy than the compared algorithms on most of the datasets. Statistical tests (Friedman test and pairwise Wilcoxon test) also justified the advantage of the AssoU2Classifier algorithm. In addition, the AssoU2Classifier algorithm also had average learning time.
Similar content being viewed by others
References
Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71(1):47–68
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In:Proceedings of the Very Large Data Base, pp. 487–499
Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. Lect Notes Artif Int 5291:229–242
Golpîra H (2018) A novel multiple attribute decision making approach based on interval data using U2P-miner algorithm. Data Knowl Eng 115:116–128
Wu M, Wang Y, Lin S, Hao B, Sun P (2019) A U2P-miner-based method to identify critical energy-consuming parts of urban rail operation system. In: proceedings of the 4th international conference on electrical and information Technologies for Rail Transportation, pp 245–255
Liu YH (2017) Generating summaries for frequent univariate uncertain pattern. NTU Manag Rev 27(2S):29–62
Liu YH (2014) Mining maximal frequent U2 patterns from univariate uncertain data. Intell Data Anal 18:653–676
Fasihy H, Shahraki MHN (2018) Incremental mining maximal frequent patterns from univariate uncertain data. Knowl-Based Syst 152:40–50
Liu YH, Wang CS (2013) Constrained frequent pattern mining on univariate uncertain data. J Syst Softw 86(3):759–778
Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39:315–344
Liu YH (2015) Mining time-interval univariate uncertain sequential patterns. Data Knowl Eng 100:54–77
Shao J, Tziatzios A (2018) Mining range associations for classification and characterization. Data Knowl Eng 118:92–106
Xie Z, Xu Y, Hu Q (2018) Uncertain data classification with additive kernel support vector machine. Data Knowl Eng 117:87–97
Huang J, Li Y, Qi K, Li F (2018) An Efficient Classification Method of Uncertain Data with Sampling. In: Liang Q, Liu X, Na Z, Wang W, Mu J, Zhang B (eds) Communications, signal processing, and systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 516
Malerba D, Esposito F, Appice A (2008) Exporting symbolic objects to databases. In: Symbolic data analysis and the SODAS software, Wiley-Interscience, New York, pp. 61–66
Oliveira MR, Vilela M, Pacheco A, Valadas R, Salvador P (2017) Extracting information from interval data using symbolic principal component analysis. Austrian J Stat 46(3–4):79–87
Chui C, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia conference on Knowledge Discovery and Data Mining, pp. 64–75
Le T, Vo B, Huynh V, Nguyen NT, Sung WB (2020) Mining top-k frequent patterns from uncertain databases. Appl Intell 50:1487–1497
Tavakkol B, Myonf KJ, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151
Ahmed U, Lin JC, Srivastava G, Yasin R, Djenouri Y (2020) An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans Emerg Topics Comput Intell 1–10
Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Future Gener Comp Sy 68:89–110
Liu CM, Niu Z, Liao KT (2019) Efficiently extracting frequent patterns from continuous uncertain data. J Chin Inst Eng 42:225–235
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pp. 273–282
Prithviraj S, Amol D, Lise G (2007) Representing tuple and attribute uncertainty in probabilistic databases. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pp. 273–282
Noirhomme-Fraiture M, Brito P (2012) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4(2):157–170
Appice A, D'Amato C, Esposito F, Malerba D (2006) Classification of symbolic objects: a lazy learning approach. Intell Data Anal 10(4):301–324
Diego CF Queiroz Renata MCR de Souza, Francisco José de A Cysneiros (2008) A classifier for interval symbolic data based on a multi-class probit model
Gan H, Zhang Y, Song Q (2017) Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recogn Lett 90:28–35
Tavakkol B, Jeong MK, Albin SL (2019) Measures of scatter and fisher discriminant analysis for uncertain uata. IEEE T Syst Man CY-S 99:1–14
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Zhang H (2004) The optimality of naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, pp. 562–567
Heckerman D (1995) A tutorial on learning with Bayesian networks, technique report. Microsoft Research
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Zeidenberg M (1990) Neural networks in artificial intelligence. Ellis Horwood Limited
Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B (2019) Evolving deep neural networks. In: Artificial intelligence in the age of neural networks and brain computing, pp. 293–312
Zhang X, Zhou X, Lin M, Sun J, (2018) ShuffleNet: an extremely efficient convolutional neural network for Mobile devices. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 6848–6856
Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2018) FINN: a framework for fast, scalable binarized neural network inference. In: proceedings of the 25th international symposium on field-programmable gate arrays, pp 65–74
Hang R, Liu Q, Hong D, Ghamisi P (2019) Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57:5384–5394
Adam C, Aliotti A, Malliaros FD, Cournède PH (2020) Dynamic monitoring of software use with recurrent neural networks. Data Knowl Eng 125:170781
Corinna C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):274–297
Quinlan JR (1993) C4.5: programs for machine learning, Morgan Kaufmann Publishers
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, Monterey. Wadsworth & Brooks/Cole Advanced Books & Software, CA
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Sun Y, Wong AKC (2006) An overview of associative classifiers. In: proceedings of the 2006 international conference on data mining, pp 138–143
Deng H, Runger G, Tuv E, Bannister W (2014) CBC: an associative classifier with a small number of rules. Decis Support Syst 59:163–170
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2:131–139
Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative. Granul Comput 4:715–724
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granul Comput 4:197–209
Liu H, Zhang L (2018) Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granul Comput 3:355–365
Liu H, Cocea M (2019) Granular computing-based approach of rule learning for binary classification. Granul Comput 4:275–283
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: proceedings of the NIPS-14
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 80–86
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple-class association rule. In: Proceedings of the International Conference on Data Mining, pp. 369–376
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: proceedings of the 2000 ACM SIGMOD international conference on Management of Data, pp 1–12
Quinlan J, Cameron-Jones R (1993) FOIL: A midterm report. In: Proceedings of the European Conference on Machine Learning, pp. 3–20
Thabtah F, Cowling P, Peng Y (2005) MCAR: multi-class classification based on association ruleapproach. In: proceeding of the 3rd IEEE international conference on computer systems and applications, pp 1–7
Liu B, Ma Y, Wong CK (2000) Improving an association rule based classifier. In: proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 504–509
Baralis E, Torino P (2002) A lazy approach to pruning classification rules. In: proceedings of the 2002 IEEE international conference on data mining, pp 35
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Iris. Accessed 2 Nov 2017
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Balance+Scale. Accessed 2 Nov 2017
Mballo C, Diday E (2006) The criterion of Kolmogorov-Smirnov for binary decision tree: application to interval valued variables. Intell Data Anal 10(4):325–341
Yeh IC, Yang KJ, Ting TM (2008) Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst Appl 36(3):5866–5871 https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center. Accessed 2 Nov 2017
Dua D, Karra Taniskidou E (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Abalone. Accessed 1 Sep 2019
Bhatt R, Dhall A (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation. Accessed 1 Sep 2019
Dua D, Karra Taniskidou E (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29. Accessed 1 Sep 2019
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/covertype. Accessed 1 Jun 2017
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Ecoli. Accessed 1 Jun 2017
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/glass+identification. Accessed 1 Jun 2017
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival. Accessed 1 Jun 2017
Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Ionosphere. Accessed 1 Jun 2017
Waugh S (1995) Extending and benchmarking Cascade-Correlation, PhD thesis, Computer Science Department, University of Tasmania
Bogawar PS, Bhoyar KK (2018) An improved multiclass support vector machine classifier using reduced hyper-plane with skewed binary tree. Appl Intell 48:4382–4391
Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. MA: Addison-Wesley Professional
Kennedy J, Eberhart R (1995) particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks IV, pp 1942–1948
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report, Erciyes University
Robu R, Holban S (2011) A genetic algorithm for classification. In: proceedings of the 2011 international conference on computers and computing, pp 52–56
Acknowledgements
This research was supported in part by the Ministry of Science and Technology of Republic of China under Grant No. MOST 103-2221-E-259 -019 -MY2.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, YH., Fan, HY. Classifying univariate uncertain data. Appl Intell 51, 2622–2650 (2021). https://doi.org/10.1007/s10489-020-01911-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01911-0