Advertisement

Dynamic Pattern Detection for Big Data Stream Analytics

  • Konstantinos F. Xylogiannopoulos
  • Panagiotis Karampelas
  • Reda Alhajj
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

The last two decades witnessed tremendous and astonishing developments in technology. This pushed for visible revolution in communication and electronics design leading to the production of computing devices of various sizes and capabilities, ranging from tiny sensors with limited specifications to mobile devices with huge power and rich functionalities, among others. These stimulated researchers and practitioners work hard seeking the best possible benefit from such novel devices to serve humanity. Gathering huge amounts of data is way easier and more affordable than ever before. Indeed, there is a clear shift from paper-based manual data collection to totally automated data collection even under sever conditions which were never feasible to consider before. Data is captured as a stream which may encapsulate some trends that may reveal certain aspects essential to our daily life. Identifying such trends in data streams is the main theme of the study described in this chapter. We mainly concentrate on real-time stream data analysis to better serve time-critical applications where instant decision making is crucial. This study builds on our methodology described in (Xylogiannopoulos et al. Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in social networks analysis and mining (ASONAM), pp. 931–938, 2016) which considers detecting all repeated patterns in a big data stream. In the new dynamic approach, a sliding window is employed with LERP Reduced Suffix Array and the ARPaD algorithm to analyze one trillion digits composed from one million subsequences of one million digits each. We achieved like generating one data point every 300 ns.

Keywords

Big data Data stream Pattern detection LERP-RSA ARPaD Data analytics 

References

  1. 1.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Repeated patterns detection in big data using classification and parallelism on LERP reduced suffix arrays. Appl. Intell. 45(3), 567–597 (2016).  https://doi.org/10.1007/s10489-016-0766-2 CrossRefGoogle Scholar
  2. 2.
    Xylogiannopoulos, K. F.: Data structures, algorithms and applications for big data analytics: single, multiple and all repeated patterns detection in discrete sequences. Unpublished PhD thesis, University of Calgary (2017)Google Scholar
  3. 3.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Analyzing very large time series using suffix arrays. Appl. Intell. 41(3), 941–955 (2014).  https://doi.org/10.1007/s10489-014-0553-x CrossRefGoogle Scholar
  4. 4.
    Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theor. Comput. Sci. 22, 297–315 (1983)CrossRefGoogle Scholar
  5. 5.
    Weiner, P.: Linear pattern matching algorithms. In: SWAT ‘73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973), pp. 1–11 (1973)Google Scholar
  6. 6.
    Guo, D., Hu, X., Xie, F., Wu, X.: Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39, 57–74 (2013)CrossRefGoogle Scholar
  7. 7.
    Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcards. Appl. Intell. 41, 99–116 (2014)CrossRefGoogle Scholar
  8. 8.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)Google Scholar
  9. 9.
    Franek, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. JALC. 8(4), 579–591 (2003)Google Scholar
  10. 10.
    Puglishi, S.J., Smyth, W.F., Yusufu, M.: Fast optimal algorithms for computing all the repeats in a string. In: Proceedings of PSC, pp. 161–169 (2008)Google Scholar
  11. 11.
    Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2009).  https://doi.org/10.1007/s00778-009-0172-z CrossRefGoogle Scholar
  12. 12.
    Boyer, R.S., Moore, J.: A fast majority vote algorithm. Technical Report ICSCA-CMP-32, Institute for Computer Science, University of Texas (1981)Google Scholar
  13. 13.
    Demaine, E., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms (ESA) (2002)Google Scholar
  14. 14.
    Karp, R., Papadimitriou, C., Shenker, S.: A simple algorithm for finding frequent elements in sets and bags. ACM Trans. Database Syst. 28, 51–55 (2003)CrossRefGoogle Scholar
  15. 15.
    Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: International Conference on Very Large Data Bases, pp. 346–357 (2002)Google Scholar
  16. 16.
    Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory (2005)Google Scholar
  17. 17.
    Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
  18. 18.
    Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 239–249. ACM (2004)Google Scholar
  19. 19.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)CrossRefGoogle Scholar
  20. 20.
    Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm. 55(1), 58–75 (2005)CrossRefGoogle Scholar
  21. 21.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Sequential all frequent Itemsets detection – a method to detect all frequent sequential itemsets using LERP–reduced suffix array data structure and ARPaD algorithhm. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining, pp. 1141–1148 (2015).  https://doi.org/10.1145/2808797.2809301 CrossRefGoogle Scholar
  22. 22.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Real time early warning DDoS attack detection. In: Proceedings of the 11th International Conference on Cyber Warfare and Security, (2016), pp. 344–351 (2016)Google Scholar
  23. 23.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Pattern detection and analysis in financial time series using suffix arrays. In: Doumpos, M., Zopounidis, C., Pardalos, P.M. (eds.) Financial Decision Making Using Computational Intelligence, pp. 129–157 (2012).  https://doi.org/10.1007/978-1-4614-3773-4_5 CrossRefGoogle Scholar
  24. 24.
    Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in Social Networks Analysis and Mining (ASONAM), pp. 931–938 (2016).  https://doi.org/10.1109/ASONAM.2016.7752351 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Konstantinos F. Xylogiannopoulos
    • 1
  • Panagiotis Karampelas
    • 2
  • Reda Alhajj
    • 1
  1. 1.Department of Computer ScienceUniversity of CalgaryCalgaryCanada
  2. 2.Department of Informatics and ComputersHellenic Air Force Academy, Dekelia Air BaseAcharnesGreece

Personalised recommendations