Advertisement

Knowledge and Information Systems

, Volume 51, Issue 3, pp 821–850 | Cite as

Sequential pattern mining in databases with temporal uncertainty

  • Jiaqi GeEmail author
  • Yuni Xia
  • Jian Wang
  • Chandima Hewa Nadungodage
  • Sunil Prabhakar
Regular Paper

Abstract

Temporally uncertain data widely exist in many real-world applications. Temporal uncertainty can be caused by various reasons such as conflicting or missing event timestamps, network latency, granularity mismatch, synchronization problems, device precision limitations, data aggregation. In this paper, we propose an efficient algorithm to mine sequential patterns from data with temporal uncertainty. We propose an uncertain model in which timestamps are modeled by random variables and then design a new approach to manage temporal uncertainty. We integrate it into the pattern-growth sequential pattern mining algorithm to discover probabilistic frequent sequential patterns. Extensive experiments on both synthetic and real datasets prove that the proposed algorithm is both efficient and scalable.

Keywords

Uncertain databases Sequential pattern mining Temporal uncertainty 

References

  1. 1.
    Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623CrossRefGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases VLDB’94, pp 487–499Google Scholar
  3. 3.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, ICDE ’95, pp 3–14Google Scholar
  4. 4.
    Allen J (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843CrossRefzbMATHGoogle Scholar
  5. 5.
    Ayres J, Flannick J, Gehrke J et al (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02, pp 429–435Google Scholar
  6. 6.
    Bernecker T, Kriegel H, Renz M et al (2009) Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’09, pp 119–128Google Scholar
  7. 7.
    Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM international conference on management of data, SIGMOD ’03, pp 551–562Google Scholar
  8. 8.
    Chui C, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08, pp 64–75Google Scholar
  9. 9.
    Chiu D, Wu Y, Chen A (2004) An efficient algorithm for mining frequent sequences by a new strategy without support counting. In: Proceedings of the 20th international conference on data engineering, ICDE ’04, pp 275–286Google Scholar
  10. 10.
    Chui C, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’07, pp 47–58Google Scholar
  11. 11.
    Dyreson C, Snodgrass R (1998) Supporting valid-time indeterminacy. ACM Trans Datab Syst 23(1):1–57CrossRefGoogle Scholar
  12. 12.
    Ge J, Xia Y, Wang J (2015) Towards efficient sequential pattern mining in temporal uncertain databases. In: Proceedings of the 19th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’15, pp 268-279Google Scholar
  13. 13.
    Han J, Pei J, Mortazavi-Asl B et al (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, pp 355–359Google Scholar
  14. 14.
    Höppner F (2001) Discovery of temporal patterns. learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, PKDD ’01, pp 192–203Google Scholar
  15. 15.
    Jestes J, Cormode G, Li F et al (2011) Semantics of ranking queries for probabilistic data. IEEE Trans Knowl Data Eng 23(12):1903–1917CrossRefGoogle Scholar
  16. 16.
    Li Y, Bailey J, Kulik L et al (2013) Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases. In: IEEE 13th international conference on data mining, ICDM’13, pp 448–457Google Scholar
  17. 17.
    Muzammal M, Raman R (2011) Mining sequential patterns from probabilistic databases. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’11, pp 210–221Google Scholar
  18. 18.
    Papapetrou P, Kollios G, Sclaroff S et al (2005) Discovering frequent arrangements of temporal intervals. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05, pp 354–361Google Scholar
  19. 19.
    Pei J, Han J, Mortazavi-asl B et al (2001) Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering, ICDE’01, pp 215–224Google Scholar
  20. 20.
    Pei J, Han J, Wang W (2002) Mining sequential patterns with constraints in large databases. In: Proceedings of the eleventh international conference on information and knowledge management, CIKM ’02, pp 18–25Google Scholar
  21. 21.
    Sadri R, Zaniolo C, Zarkesh A et al (2004) Expressing and optimizing sequence queries in database systems. ACM Trans Database Syst 29(2):282–318CrossRefGoogle Scholar
  22. 22.
    Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT ’96, pp 3–17Google Scholar
  23. 23.
    Sun X, Orlowska M, Li X (2003) Introducing uncertainty into pattern discovery in temporal event sequences. In: Proceedings of the third IEEE international conference on data mining, pp 299–306Google Scholar
  24. 24.
    Sun L, Cheng R, Cheung D et al (2010a) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, pp 273–282Google Scholar
  25. 25.
    Sun L, Cheng R, Cheung D et al (2010b) Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, pp 273–282Google Scholar
  26. 26.
    Wan L, Chen L, Zhang C (2013) Mining frequent serial episodes over uncertain sequence data. In: Proceedings of the 16th international conference on extending database technology, EDBT’13, pp 215–226Google Scholar
  27. 27.
    Winarko E, Roddick J (2007) Armada—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90CrossRefGoogle Scholar
  28. 28.
    Yang J, Wang W, Yu P et al (2002) Mining long sequential patterns in a noisy environment. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, SIGMOD ’02, pp 406–417Google Scholar
  29. 29.
    Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60CrossRefzbMATHGoogle Scholar
  30. 30.
    Zhang H, Diao Y, Immerman N (2010) Recognizing patterns in streams with imprecise timestamps. Proc VLDB Endow 3(1–2):244–255CrossRefGoogle Scholar
  31. 31.
    Zhao Z, Yan D, Ng W (2012) Mining probabilistically frequent sequential patterns in uncertain databases. In: Proceedings of the 15th international conference on extending database technology, EDBT’12, pp 74–85Google Scholar
  32. 32.
    Zhao Z, Yan D, Ng W (2013) Mining probabilistically frequent sequential patterns in large uncertain databases. IEEE Trans Knowl Data Eng 26(5):1171–1184CrossRefGoogle Scholar
  33. 33.
    Zhou Y, Ma C, Guo Q et al (2014) Sequence pattern matching over time-series data with temporal uncertainty. In: Proceedings of the 17th international conference on extending database technology, EDBT’14, pp 205–216Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Jiaqi Ge
    • 1
    • 4
    Email author
  • Yuni Xia
    • 1
  • Jian Wang
    • 2
  • Chandima Hewa Nadungodage
    • 1
  • Sunil Prabhakar
    • 3
  1. 1.Department of Computer and Information ScienceIndiana University Purdue University IndianapolisIndianapolisUSA
  2. 2.School of Electronic Science and EngineeringNanjing UniversityNanjingChina
  3. 3.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  4. 4.Expedia Inc.ChicagoUSA

Personalised recommendations