Efficient Mining of Uncertain Data for High-Utility Itemsets

  • Jerry Chun-Wei Lin
  • Wensheng Gan
  • Philippe Fournier-Viger
  • Tzung-Pei Hong
  • Vincent S. Tseng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9658)


High-utility itemset mining (HUIM) is emerging as an important research topic in data mining. Most algorithms for HUIM can only handle precise data, however, uncertainty that are embedded in big data which collected from experimental measurements or noisy sensors in real-life applications. In this paper, an efficient algorithm, namely Mining Uncertain data for High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) from uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mine PHUIs without candidate generation and can reduce the construction of PU-lists for numerous unpromising itemsets by using several efficient pruning strategies, thus greatly improving the mining performance. Extensive experiments both on real-life and synthetic datasets proved that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, especially, the MUHUI algorithm scales well on large-scale uncertain datasets for mining PHUIs.


Data mining Uncertainty High-utility itemset PU-list Pruning strategies 



This research was partially supported by the National Natural Science Foundation of China (NSFC) under grant No. 61503092 and by the Tencent Project under grant CCF-TencentRAGR20140114.


  1. 1.
    Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/
  2. 2.
    Aggarwal, C.C.: Managing and mining uncertain Data (2010)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)CrossRefGoogle Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: The International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  5. 5.
    Agrawal, R., Srikant, R.: Quest synthetic data generator. http://www.Almaden.ibm.com/cs/quest/syndata.html
  6. 6.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Le, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)CrossRefGoogle Scholar
  7. 7.
    Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefl, A.: Probabilistic frequent itemset mining in uncertain databases. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2009)Google Scholar
  8. 8.
    Chan, R., Yang, Q., Shen, Y.D.: Mining high utility itemsets. In: IEEE International Conference on Data Mining, pp. 19–26 (2003)Google Scholar
  9. 9.
    Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38 (2006)Google Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Lin, J.C.W., Gan, W., Hong, T.P., Tseng, V.S.: Efficient algorithms for mining up-to-date high-utility patterns. Adv. Eng. Inform. 29(3), 648–661 (2015)CrossRefGoogle Scholar
  13. 13.
    Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Mining potential high-utility itemsets over uncertain databases. In: ACM ASE BigData & Social Informatics, p. 25 (2015)Google Scholar
  14. 14.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)Google Scholar
  15. 15.
    Liu, Y., Liao, W., Choudhary, A.K.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Microsoft: Example Database foodmart of Microsoft Analysis Services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx
  17. 17.
    Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 83–92. Springer, Heidelberg (2014)Google Scholar
  18. 18.
    Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: UP-growth: an efficient algorithm for high utility itemset mining. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262 (2010)Google Scholar
  19. 19.
    Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)CrossRefGoogle Scholar
  20. 20.
    Wu, C.W., Shie, B.E., Tseng, V.S., Yu, P.S.: Mining top-\(k\) high utility itemsets. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 78–86 (2012)Google Scholar
  21. 21.
    Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: SIAM International Conference on Data Mining, pp. 211–225 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jerry Chun-Wei Lin
    • 1
  • Wensheng Gan
    • 1
  • Philippe Fournier-Viger
    • 2
  • Tzung-Pei Hong
    • 3
    • 4
  • Vincent S. Tseng
    • 5
  1. 1.School of Computer Science and TechnologyHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  2. 2.School of Natural Sciences and HumanitiesHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  3. 3.Department of Computer Science and Information EngineeringNational University of KaohsiungKaohsiungTaiwan
  4. 4.Department of Computer Science and EngineeringNational Sun Yat-sen UniversityKaohsiungTaiwan
  5. 5.Department of Computer ScienceNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations