Skip to main content
Log in

SSRDVis: Interactive visualization for event sequences summarization and rare detection

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

This paper presents SSRDVis, a visual approach to effectively summarize event sequences and interactively detect rare behaviors. SSRDVis is mainly composed of three components: (1) a sequence embedding module for learning effective feature vectors of sequences, (2) a sequence grouping and summarization module to find representative clusters and patterns in the dataset, (3) a rare detection module to discover and explain the rare cases. The sequences are embedded into vector space via “mixed-ngram2vec,” which is adapted from “word2vec.” Then, unsupervised learning models could be applied to group similar sequences and detect anomalies in the vector space. Furthermore, sequential pattern graphs are built to provide a compact and semantic summarization of sequences. These components work together to present both overall sequential patterns and abnormal behaviors in one visual interface. We have demonstrated the feasibility of our approach by applying it to analyze Web clickstreams. Experimental results have shown that our approach could help identify noticeable patterns from a large number of event sequences, especially for rare behaviors.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agarwal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499

  • Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 429–435

  • Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 380–391

  • Chen Y, Xu P, Ren L (2017) Sequence synopsis: optimize visual summary of temporal event data. IEEE Trans Vis Comput Gr 24(1):45–55

    Article  Google Scholar 

  • Cuenca E, Sallaberry A, Ying Wang F, Poncelet P (2018) MultiStream: a multiresolution streamgraph approach to explore hierarchical time series. IEEE Trans Vis Comput Gr 24(12):3160–3173

    Article  Google Scholar 

  • Du F, Shneiderman B, Plaisant C, Malik S, Perer A (2016) Coping with volume and variety in temporal event sequences: strategies for sharpening analytic focus. IEEE Trans Vis Comput Gr 23(6):1636–1649

    Article  Google Scholar 

  • Fan X, Li C, Dong X (2019) A real-time network security visualization system based on incremental learning (chinavis 2018). J Vis 22(1):215–229

    Article  Google Scholar 

  • Fournier-Viger P, Wu CW, Tseng VS (2012) Mining top-k association rules. In: Canadian conference on artificial intelligence. Springer, pp 61–73

  • Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–52

  • Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77

    Google Scholar 

  • Guo S, Xu K, Zhao R, Gotz D, Zha H, Cao N (2017) EventThread: visual summarization and stage analysis of event sequence data. IEEE Trans Vis Comput Gr 99:1–1

    Google Scholar 

  • Guo S, Du F, Malik S, Koh E, Kim S, Liu Z, Kim D, Zha H, Cao N (2019) Visualizing uncertainty and alternatives in event sequence predictions. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, p 573

  • Heckerman D (1999) Msnbc. com anonymous web data set

  • Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Trans Knowl Discov Data 10(4):45

    Article  Google Scholar 

  • Kwon BC, Choi M-J, Kim JT, Choi E, Kim YB, Kwon S, Sun J, Choo J (2019) Retainvis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans Vis Comput Gr 25(1):299–309

    Article  Google Scholar 

  • Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Eighth IEEE international conference on data mining. IEEE, pp 413–422

  • Liu Z, Wang Y, Dontcheva M, Hoffman M, Walker S, Wilson A (2016) Patterns and sequences: interactive exploration of clickstreams to understand common visitor paths. IEEE Trans Vis Comput Gr 23(1):321–330

    Article  Google Scholar 

  • Liu Z, Kerr B, Dontcheva M, Grover J, Hoffman M, Wilson A (2017) Coreflow: extracting and visualizing branching patterns from event sequences. Comput Gr Forum 36(3):527–538

    Article  Google Scholar 

  • Lu J, Wang X-F, Adjei O, Hussain F (2004) Sequential patterns graph and its construction algorithm. Chin J Comput Chin Edn 27(6):782–788

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Monroe M, Lan R, Lee H, Plaisant C, Shneiderman B (2013) Temporal event sequence simplification. IEEE Trans Vis Comput Gr 19(12):2227–2236

    Article  Google Scholar 

  • Ng P (2017) dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279

  • Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas LJ, Sohl-Dickstein J (2015) Deep knowledge tracing. In: Advances in neural information processing systems, pp 505–513

  • Plaisant C, Shneiderman B (2016) The diversity of data and tasks in event analytics. In: Proceedings of the IEEE VIS 2016 workshop on temporal and sequential event analysis

  • Samet A, Guyet T, Négrevergne B (2017) Mining rare sequential patterns with ASP. In: ILP

  • Scholtes I (2017) When is a network a network? Multi-order graphical model selection in pathways and temporal networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1037–1046

  • Song Y, Wen Z, Lin CY, Davis R (2013) One-class conditional random fields for sequential anomaly detection. In: Twenty-third international joint conference on artificial intelligence

  • Sugiyama K, Tagawa S, Toda M (1981) Methods for visual understanding of hierarchical system structures. IEEE Trans Syst Man Cybern 11(2):109–125

    Article  MathSciNet  Google Scholar 

  • Unger A, Dräger N, Sips M, Lehmann DJ (2017) Understanding a sequence of sequences: visual exploration of categorical states in lake sediment cores. IEEE Trans Vis Comput Gr 99:1

    Google Scholar 

  • Wei J, Shen Z, Sundaresan N, Ma KL (2012) Visual cluster exploration of web clickstream data. In: IEEE VAST, pp 3–12

  • Wongsuphasawat K, Gotz D (2012) Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Trans Vis Comput Gr 18(12):2659–2668

    Article  Google Scholar 

  • Wongsuphasawat K, Guerra Gómez JA, Plaisant C, Wang TD, Taieb-Maimon M, Shneiderman, B (2011) Lifeflow: visualizing an overview of event sequences. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1747–1756

  • Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60

    Article  Google Scholar 

  • Zhao Z, Liu T, Li S, Li B, Du X (2017) Ngram2vec: learning improved word representations from ngram co-occurrence statistics. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 244–253

  • Zhu J, Wang K, Wu Y, Hu Z, Wang H (2016) Mining user-aware rare sequential topic patterns in document streams. IEEE Trans Knowl Data Eng 28(7):1790–1804

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China (Grant No. 2017YFB0701900), National Nature Science Foundation of China (Grant No. 61100053) and Key Laboratory of Machine Perception in Peking University (K-2019-09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoju Dong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Dong, X., Liu, W. et al. SSRDVis: Interactive visualization for event sequences summarization and rare detection. J Vis 23, 171–184 (2020). https://doi.org/10.1007/s12650-019-00609-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-019-00609-x

Keywords

Navigation