Advertisement

Efficiently Mining Constrained Subsequence Patterns

  • Abdullah Albarrak
  • Sanad Al-Maskari
  • Ibrahim A. Ibrahim
  • Abdulqader M. Almars
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11323)

Abstract

Big time series data are generated daily by various application domains such as environment monitoring, internet of things, health care, industry and science. Mining this massive data is a very challenging task because conventional data mining algorithms are unable to scale effectively with massive time series data. Moreover, applying a global classification approach to a highly similar and noisy data will hinder the classification performance. Therefore, utilizing constrained subsequence patterns in data mining applications increases the efficiency, accuracy, and could provide useful insight into the data.

To address the above mentioned limitations, we propose an efficient subsequence processing technique with preferences constraints. Then, we introduce a sub-patterns analysis for time series data. The sub-pattern analysis objective is to maximize the interclass separability using a localization approach. Furthermore, we make use of the deviation from a correlation constraint as an objective to minimize in our problem, and we include users preferences as an objective to maximize in proportion to users’ preferred time intervals. We experimentally validate the efficiency and effectiveness of our proposed algorithm using real data to demonstrate its superiority and efficiency when compared to recently proposed correlation-based subsequence search algorithms.

Notes

Acknowledgments

We would like to thank Lemma solutions (www.lemma.com.au) for their help during the production of this paper.

References

  1. 1.
    Al-Maskari, S., Bélisle, E., Li, X., Le Digabel, S., Nawahda, A., Zhong, J.: Classification with quantification for air quality monitoring. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 578–590. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-31753-3_46CrossRefGoogle Scholar
  2. 2.
    Al-Maskari, S., Guo, W., Zhao, X.: Biologically inspired pattern recognition for e-nose sensors. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016. LNCS, vol. 10086, pp. 142–155. Springer International Publishing, Cham (2016).  https://doi.org/10.1007/978-3-319-49586-6_10CrossRefGoogle Scholar
  3. 3.
    Al-Maskari, S., Ibrahim, I.A., Li, X., Abusham, E., Almars, A.: Feature extraction for smart sensing using multi-perspectives transformation. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds.) ADC 2018. LNCS, vol. 10837, pp. 236–248. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-92013-9_19CrossRefGoogle Scholar
  4. 4.
    Al-Maskari, S., Li, X., Liu, Q.: An effective approach to handling noise and drift in electronic noses. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 223–230. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-08608-8_21CrossRefGoogle Scholar
  5. 5.
    Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  6. 6.
    Gavrilov, M., Anguelov, D., Indyk, P., Motwani, R.: Mining the stock market (extended abstract): which measure is best? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 20–23 August 2000, Boston, MA, USA, pp. 487–496 (2000)Google Scholar
  7. 7.
    Ghazavi, S.N., Liao, T.W.: Medical data mining by fuzzy modeling with selected features. Artif. Intell. Med. 43(3), 195–206 (2008)CrossRefGoogle Scholar
  8. 8.
    Ibrahim, I.A., Albarrak, A.M., Li, X.: Constrained recommendations for query visualizations. Knowl. Inf. Syst. 51(2), 499–529 (2017)CrossRefGoogle Scholar
  9. 9.
    Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371 (2003)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Li, Y., U, L.H., Yiu, M.L., Gong, Z.: Discovering longest-lasting correlation in sequence databases. PVLDB 6(14), 1666–1677 (2013)CrossRefGoogle Scholar
  11. 11.
    Mueen, A., Hamooni, H., Estrada, T.: Time series join on subsequence correlation. In: 2014 IEEE International Conference on Data Mining, ICDM 2014, 14–17 December 2014, Shenzhen, China, pp. 450–459 (2014)Google Scholar
  12. 12.
    Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, 6–10 June 2010, Indianapolis, Indiana, USA, pp. 171–182 (2010)Google Scholar
  13. 13.
    Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1 (2014)CrossRefGoogle Scholar
  14. 14.
    Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, 12–16 August 2012, Beijing, China, pp. 262–270 (2012)Google Scholar
  15. 15.
    Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through group lag correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 14–16 June 2005, Baltimore, Maryland, USA, pp. 599–610 (2005)Google Scholar
  16. 16.
    Utomo, C., Li, X., Wang, S.: Classification based on compressive multivariate time series. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) ADC 2016. LNCS, vol. 9877, pp. 204–214. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46922-5_16CrossRefGoogle Scholar
  17. 17.
    Nahar, V., Al-Maskari, S., Li, X., Pang, C.: Semi-supervised learning for cyberbullying detection in social networks. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 160–171. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-08608-8_14CrossRefGoogle Scholar
  18. 18.
    Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, 20–23 August 2002, Hong Kong, China, pp. 358–369 (2002)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Abdullah Albarrak
    • 1
  • Sanad Al-Maskari
    • 2
  • Ibrahim A. Ibrahim
    • 3
    • 4
  • Abdulqader M. Almars
    • 3
  1. 1.Al Imam Mohammad Ibn Saud Islamic UniversityRiyadhSaudi Arabia
  2. 2.Sohar UniversitySoharOman
  3. 3.University of QueenslandBrisbaneAustralia
  4. 4.Minia UniversityMinyaEgypt

Personalised recommendations