Advertisement

Time Series Discord Discovery on Intel Many-Core Systems

  • Mikhail ZymblerEmail author
  • Andrey Polyakov
  • Mikhail Kipnis
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1063)

Abstract

A discord is a refinement of the concept of an anomalous subsequence of a time series. The task of discovering discords is applied in a wide range of subject areas involving time series: medicine, economics, climate modeling, and others. In this paper, we propose a novel parallel algorithm for discord discovery using Intel MIC (Many Integrated Core) accelerators in the case when time series fit in the main memory. We achieve parallelization through thread-level parallelism and OpenMP technology. The algorithm employs a set of matrix data structures to store and index the subsequences of a time series and to provide an efficient vectorization of computations on the Intel MIC platform. Moreover, the algorithm exploits the ability to independently computing Euclidean distances between subsequences of a time series. The algorithm iterates subsequences in two nested loops; it parallelizes the outer and the inner loops separately and differently, depending on both the number of running threads and the cardinality of the sets of subsequences scanned in the loop. The experimental evaluation shows the high scalability of the proposed algorithm.

Keywords

Time series Discord discovery OpenMP Intel Xeon Phi Data layout Vectorization 

Notes

Acknowledgments

This work was financially supported by the Russian Foundation for Basic Research (grant No. 17-07-00463) and by the Ministry of Science and Higher Education of the Russian Federation (government orders 2.7905.2017/8.9 and 14.578.21.0265).

The authors thank the Siberian Supercomputer Center (Novosibirsk, Russia) and the South Ural State University (Chelyabinsk, Russia) for the computational resources provided.

References

  1. 1.
    Ameen, J., Basha, R.: Mining time series for identifying unusual sub-sequences with applications. In: First International Conference on Innovative Computing, Information and Control, ICICIC 2006, Beijing, China, 30 August–1 September 2006, pp. 574–577. IEEE Computer Society (2006).  https://doi.org/10.1109/ICICIC.2006.115
  2. 2.
    Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high-performance computing. ACM Comput. Surv. 26(4), 345–420 (1994).  https://doi.org/10.1145/197405.197406CrossRefGoogle Scholar
  3. 3.
    Buu, H.T.Q., Anh, D.T.: Time series discord discovery based on iSAX symbolic representation. In: 3rd International Conference on Knowledge and Systems Engineering, KSE 2011, Hanoi, Vietnam, 14–17 October 2011, pp. 11–18. IEEE Computer Society (2011).  https://doi.org/10.1109/KSE.2011.11
  4. 4.
    Chrysos, G.: Intel® Xeon Phi coprocessor (codename Knights Corner). In: 2012 IEEE Hot Chips 24th Symposium (HCS), Cupertino, CA, USA, 27–29 August 2012, pp. 1–31 (2012).  https://doi.org/10.1109/HOTCHIPS.2012.7476487
  5. 5.
    Chuah, M.C., Fu, F.: ECG anomaly detection via time series analysis. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds.) ISPA 2007. LNCS, vol. 4743, pp. 123–135. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74767-3_14CrossRefGoogle Scholar
  6. 6.
    Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960).  https://doi.org/10.1145/367390.367400CrossRefGoogle Scholar
  7. 7.
    Fu, A.W., Leung, O.T.-W., Keogh, E., Lin, J.: Finding time series discords based on haar transform. In: Li, X., Zaïane, O.R., Li, Z., et al. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 31–41. Springer, Heidelberg (2006).  https://doi.org/10.1007/11811305_3CrossRefGoogle Scholar
  8. 8.
    Huang, T., et al.: Parallel discord discovery. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 233–244. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-31750-2_19CrossRefGoogle Scholar
  9. 9.
    Keogh, E.J., Lin, J., Fu, A.W.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE International Conference on Data Mining, ICDM 2005, Houston, Texas, USA, 27–30 November 2005, pp. 226–233. IEEE Computer Society (2005).  https://doi.org/10.1109/ICDM.2005.79
  10. 10.
    Keogh, E.J., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 22–25 August 2004, pp. 206–215. ACM (2004).  https://doi.org/10.1145/1014052.1014077
  11. 11.
    Knuth, D.: The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions. Addison-Wesley Professional, Boston (2005)zbMATHGoogle Scholar
  12. 12.
    Kostenetskiy, P., Semenikhina, P.: SUSU supercomputer resources for industry and fundamental science. In: 2018 Global Smart Industry Conference (GloSIC), Chelyabinsk, Russia, 13–15 November 2018, p. 8570068 (2018).  https://doi.org/10.1109/GloSIC.2018.8570068
  13. 13.
    Li, G., Bräysy, O., Jiang, L., Wu, Z., Wang, Y.: Finding time series discord based on bit representation clustering. Knowl.-Based Syst. 54, 243–254 (2013).  https://doi.org/10.1016/j.knosys.2013.09.015CrossRefGoogle Scholar
  14. 14.
    Lin, J., Keogh, E.J., Lonardi, S., Chiu, B.Y.: A symbolic representation of time series, with implications for streaming algorithms. In: Zaki, M.J., Aggarwal, C.C. (eds.) Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD 2003, San Diego, California, USA, 13 June 2003, pp. 2–11. ACM (2003).  https://doi.org/10.1145/882082.882086
  15. 15.
    Mattson, T.: Introduction to OpenMP. In: Proceedings of the ACM/IEEE SC 2006 Conference on High Performance Networking and Computing, Tampa, FL, USA, 11–17 November 2006, p. 209. ACM Press (2006).  https://doi.org/10.1145/1188455.1188673
  16. 16.
    Shieh, J., Keogh, E.J.: iSAX: indexing and mining terabyte sized time series. In: Li, Y., Liu, B., Sarawagi, S. (eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, 24–27 August 2008, pp. 623–631. ACM (2008).  https://doi.org/10.1145/1401890.1401966
  17. 17.
    Sodani, A.: Knights Landing (KNL): 2nd generation Intel® Xeon Phi processor. In: 2015 IEEE Hot Chips 27th Symposium (HCS), Cupertino, CA, USA, 22–25 August 2015, pp. 1–24. IEEE (2015).  https://doi.org/10.1109/HOTCHIPS.2015.7477467
  18. 18.
    Sokolinskaya, I., Sokolinsky, L.: Revised pursuit algorithm for solving non-stationary linear programming problems on modern Computing clusters with manycore accelerators. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 212–223. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-55669-7_17CrossRefGoogle Scholar
  19. 19.
    Thuy, H.T.T., Anh, D.T., Chau, V.T.N.: An effective and efficient hash-based algorithm for time series discord discovery. In: 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), Danang, Vietnam, 14–16 September, pp. 85–90 (2016).  https://doi.org/10.1109/NICS.2016.7725673
  20. 20.
    Wei, L., Keogh, E.J., Xi, X.: SAXually explicit images: finding unusual shapes. In: Proceedings of the 6th IEEE International Conference on Data Mining, ICDM 2006, Hong Kong, China, 18–22 December 2006, pp. 711–720. IEEE Computer Society (2006).  https://doi.org/10.1109/ICDM.2006.138
  21. 21.
    Wu, Y., Zhu, Y., Huang, T., Li, X., Liu, X., Liu, M.: Distributed discord discovery: Spark based anomaly detection in time series. In: 17th IEEE International Conference on High Performance Computing and Communications, HPCC 2015, 7th IEEE International Symposium on Cyberspace Safety and Security, CSS 2015, and 12th IEEE International Conference on Embedded Software and Systems, ICESS 2015, New York, NY, USA, 24–26 August 2015, pp. 154–159. IEEE (2015).  https://doi.org/10.1109/HPCC-CSS-ICESS.2015.228
  22. 22.
    Yankov, D., Keogh, E.J., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007, Omaha, Nebraska, USA, 28–31 October 2007, pp. 381–390. IEEE Computer Society (2007).  https://doi.org/10.1109/ICDM.2007.61
  23. 23.
    Yankov, D., Keogh, E.J., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst. 17(2), 241–262 (2008).  https://doi.org/10.1007/s10115-008-0131-9CrossRefGoogle Scholar
  24. 24.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Nahum, E.M., Xu, D. (eds.) 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2010, Boston, MA, USA, 22 June 2010. USENIX Association (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.South Ural State UniversityChelyabinskRussia
  2. 2.South Ural State Humanitarian and Pedagogical UniversityChelyabinskRussia

Personalised recommendations