Skip to main content
Log in

Classifying univariate uncertain data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In the literature, univariate uncertain data has a quantitative interval for each attribute in each transaction, which is accompanied by a probability density function indicating the probability that each value in the interval exists and appears. To the best of our knowledge, classifying univariate uncertain data has thus far seldom been addressed in the literature. Here, we propose the AssoU2Classifier algorithm to address this research gap. The AssoU2Classifier algorithm retrieves association rules from the univariate uncertain data to serve as a classification model. In addition, the U2Pruning procedure is developed to prune the association rules. The U2Pruning procedure not only reduces the number of association rules, which considerably accelerates the classification process, but also achieves high classification accuracies. In the experiments, the AssoU2Classifier algorithm was compared with 14 existing algorithms on 12 modified UCI datasets. The AssoU2Classifier algorithm obtained better classification accuracy than the compared algorithms on most of the datasets. Statistical tests (Friedman test and pairwise Wilcoxon test) also justified the advantage of the AssoU2Classifier algorithm. In addition, the AssoU2Classifier algorithm also had average learning time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71(1):47–68

    Article  Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In:Proceedings of the Very Large Data Base, pp. 487–499

  3. Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. Lect Notes Artif Int 5291:229–242

    Google Scholar 

  4. Golpîra H (2018) A novel multiple attribute decision making approach based on interval data using U2P-miner algorithm. Data Knowl Eng 115:116–128

    Article  Google Scholar 

  5. Wu M, Wang Y, Lin S, Hao B, Sun P (2019) A U2P-miner-based method to identify critical energy-consuming parts of urban rail operation system. In: proceedings of the 4th international conference on electrical and information Technologies for Rail Transportation, pp 245–255

  6. Liu YH (2017) Generating summaries for frequent univariate uncertain pattern. NTU Manag Rev 27(2S):29–62

    Google Scholar 

  7. Liu YH (2014) Mining maximal frequent U2 patterns from univariate uncertain data. Intell Data Anal 18:653–676

    Article  Google Scholar 

  8. Fasihy H, Shahraki MHN (2018) Incremental mining maximal frequent patterns from univariate uncertain data. Knowl-Based Syst 152:40–50

    Article  Google Scholar 

  9. Liu YH, Wang CS (2013) Constrained frequent pattern mining on univariate uncertain data. J Syst Softw 86(3):759–778

    Article  Google Scholar 

  10. Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39:315–344

    Article  Google Scholar 

  11. Liu YH (2015) Mining time-interval univariate uncertain sequential patterns. Data Knowl Eng 100:54–77

    Article  Google Scholar 

  12. Shao J, Tziatzios A (2018) Mining range associations for classification and characterization. Data Knowl Eng 118:92–106

    Article  Google Scholar 

  13. Xie Z, Xu Y, Hu Q (2018) Uncertain data classification with additive kernel support vector machine. Data Knowl Eng 117:87–97

    Article  Google Scholar 

  14. Huang J, Li Y, Qi K, Li F (2018) An Efficient Classification Method of Uncertain Data with Sampling. In: Liang Q, Liu X, Na Z, Wang W, Mu J, Zhang B (eds) Communications, signal processing, and systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 516

  15. Malerba D, Esposito F, Appice A (2008) Exporting symbolic objects to databases. In: Symbolic data analysis and the SODAS software, Wiley-Interscience, New York, pp. 61–66

  16. Oliveira MR, Vilela M, Pacheco A, Valadas R, Salvador P (2017) Extracting information from interval data using symbolic principal component analysis. Austrian J Stat 46(3–4):79–87

    Article  Google Scholar 

  17. Chui C, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia conference on Knowledge Discovery and Data Mining, pp. 64–75

  18. Le T, Vo B, Huynh V, Nguyen NT, Sung WB (2020) Mining top-k frequent patterns from uncertain databases. Appl Intell 50:1487–1497

    Article  Google Scholar 

  19. Tavakkol B, Myonf KJ, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151

    Article  Google Scholar 

  20. Ahmed U, Lin JC, Srivastava G, Yasin R, Djenouri Y (2020) An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans Emerg Topics Comput Intell 1–10

  21. Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Future Gener Comp Sy 68:89–110

    Article  Google Scholar 

  22. Liu CM, Niu Z, Liao KT (2019) Efficiently extracting frequent patterns from continuous uncertain data. J Chin Inst Eng 42:225–235

    Article  Google Scholar 

  23. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proceedings of the ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pp. 273–282

  24. Prithviraj S, Amol D, Lise G (2007) Representing tuple and attribute uncertainty in probabilistic databases. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pp. 273–282

  25. Noirhomme-Fraiture M, Brito P (2012) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4(2):157–170

    Article  MathSciNet  Google Scholar 

  26. Appice A, D'Amato C, Esposito F, Malerba D (2006) Classification of symbolic objects: a lazy learning approach. Intell Data Anal 10(4):301–324

    Article  Google Scholar 

  27. Diego CF Queiroz Renata MCR de Souza, Francisco José de A Cysneiros (2008) A classifier for interval symbolic data based on a multi-class probit model

  28. Gan H, Zhang Y, Song Q (2017) Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recogn Lett 90:28–35

    Article  Google Scholar 

  29. Tavakkol B, Jeong MK, Albin SL (2019) Measures of scatter and fisher discriminant analysis for uncertain uata. IEEE T Syst Man CY-S 99:1–14

    Google Scholar 

  30. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  Google Scholar 

  31. Zhang H (2004) The optimality of naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, pp. 562–567

  32. Heckerman D (1995) A tutorial on learning with Bayesian networks, technique report. Microsoft Research

  33. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  34. Zeidenberg M (1990) Neural networks in artificial intelligence. Ellis Horwood Limited

  35. Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B (2019) Evolving deep neural networks. In: Artificial intelligence in the age of neural networks and brain computing, pp. 293–312

  36. Zhang X, Zhou X, Lin M, Sun J, (2018) ShuffleNet: an extremely efficient convolutional neural network for Mobile devices. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 6848–6856

  37. Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2018) FINN: a framework for fast, scalable binarized neural network inference. In: proceedings of the 25th international symposium on field-programmable gate arrays, pp 65–74

  38. Hang R, Liu Q, Hong D, Ghamisi P (2019) Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57:5384–5394

    Article  Google Scholar 

  39. Adam C, Aliotti A, Malliaros FD, Cournède PH (2020) Dynamic monitoring of software use with recurrent neural networks. Data Knowl Eng 125:170781

    Article  Google Scholar 

  40. Corinna C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):274–297

    MATH  Google Scholar 

  41. Quinlan JR (1993) C4.5: programs for machine learning, Morgan Kaufmann Publishers

  42. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, Monterey. Wadsworth & Brooks/Cole Advanced Books & Software, CA

    MATH  Google Scholar 

  43. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  44. Sun Y, Wong AKC (2006) An overview of associative classifiers. In: proceedings of the 2006 international conference on data mining, pp 138–143

  45. Deng H, Runger G, Tuv E, Bannister W (2014) CBC: an associative classifier with a small number of rules. Decis Support Syst 59:163–170

    Article  Google Scholar 

  46. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  Google Scholar 

  47. Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2:131–139

    Article  Google Scholar 

  48. Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative. Granul Comput 4:715–724

    Article  Google Scholar 

  49. Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granul Comput 4:197–209

    Article  Google Scholar 

  50. Liu H, Zhang L (2018) Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granul Comput 3:355–365

    Article  Google Scholar 

  51. Liu H, Cocea M (2019) Granular computing-based approach of rule learning for binary classification. Granul Comput 4:275–283

    Article  Google Scholar 

  52. Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: proceedings of the NIPS-14

  53. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 80–86

  54. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple-class association rule. In: Proceedings of the International Conference on Data Mining, pp. 369–376

  55. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: proceedings of the 2000 ACM SIGMOD international conference on Management of Data, pp 1–12

  56. Quinlan J, Cameron-Jones R (1993) FOIL: A midterm report. In: Proceedings of the European Conference on Machine Learning, pp. 3–20

  57. Thabtah F, Cowling P, Peng Y (2005) MCAR: multi-class classification based on association ruleapproach. In: proceeding of the 3rd IEEE international conference on computer systems and applications, pp 1–7

  58. Liu B, Ma Y, Wong CK (2000) Improving an association rule based classifier. In: proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 504–509

  59. Baralis E, Torino P (2002) A lazy approach to pruning classification rules. In: proceedings of the 2002 IEEE international conference on data mining, pp 35

  60. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Iris. Accessed 2 Nov 2017

  61. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Balance+Scale. Accessed 2 Nov 2017

  62. Mballo C, Diday E (2006) The criterion of Kolmogorov-Smirnov for binary decision tree: application to interval valued variables. Intell Data Anal 10(4):325–341

    Article  Google Scholar 

  63. Yeh IC, Yang KJ, Ting TM (2008) Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst Appl 36(3):5866–5871 https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center. Accessed 2 Nov 2017

    Article  Google Scholar 

  64. Dua D, Karra Taniskidou E (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Abalone. Accessed 1 Sep 2019

  65. Bhatt R, Dhall A (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation. Accessed 1 Sep 2019

  66. Dua D, Karra Taniskidou E (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29. Accessed 1 Sep 2019

  67. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/covertype. Accessed 1 Jun 2017

  68. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Ecoli. Accessed 1 Jun 2017

  69. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/glass+identification. Accessed 1 Jun 2017

  70. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival. Accessed 1 Jun 2017

  71. Dua D, Karra Taniskidou E (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/ml/datasets/Ionosphere. Accessed 1 Jun 2017

  72. Waugh S (1995) Extending and benchmarking Cascade-Correlation, PhD thesis, Computer Science Department, University of Tasmania

  73. Bogawar PS, Bhoyar KK (2018) An improved multiclass support vector machine classifier using reduced hyper-plane with skewed binary tree. Appl Intell 48:4382–4391

    Article  Google Scholar 

  74. Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. MA: Addison-Wesley Professional

  75. Kennedy J, Eberhart R (1995) particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks IV, pp 1942–1948

  76. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report, Erciyes University

  77. Robu R, Holban S (2011) A genetic algorithm for classification. In: proceedings of the 2011 international conference on computers and computing, pp 52–56

Download references

Acknowledgements

This research was supported in part by the Ministry of Science and Technology of Republic of China under Grant No. MOST 103-2221-E-259 -019 -MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Ho Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, YH., Fan, HY. Classifying univariate uncertain data. Appl Intell 51, 2622–2650 (2021). https://doi.org/10.1007/s10489-020-01911-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01911-0

Keywords

Navigation