Efficient Longest Streak Discovery in Multidimensional Sequence Data

Wang, Wentao; Tang, Bo; Zhu, Min

doi:10.1007/978-3-319-96893-3_13

Wentao Wang^16,17,
Bo Tang¹⁶ &
Min Zhu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10988))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1624 Accesses

Abstract

This paper studies the problem of discovering longest streak in multidimensional sequence dataset. Given a multidimensional sequence dataset, the contextual longest streak is the longest consecutive tuples in a context subspace which match with a specific measure constraint. It has various applications in social network analysis, computational journalism, etc. The challenges of the longest streak discovery problem are (i) huge search space, and (ii) non-monotonicity property of streak lengths. In this paper, we propose a novel computation framework with a suite of optimization techniques for it. Our solutions outperform the baseline solution by two orders of magnitude in both real and synthetic datasets. In addition, we validate the effectiveness of our proposal by a real-world case study.

B. Tang—is co-first author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Temporal Constraints and Sub-Dimensional Clustering for Fast Similarity Search over Time Series Data. Application to Information Retrieval Tasks.

Efficient discovery of longest-lasting correlation in sequence databases

Article 23 June 2016

Multidimensional Longest Increasing Subsequences and Its Variants Discovery Using DNA Operations

Notes

References

Aldous, D., Diaconis, P.: Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bull. Am. Math. Soc. 36(4), 413–432 (1999)
Article MathSciNet Google Scholar
Cohen, S., Hamilton, J.T., Turner, F.: Computational journalism. Commun. ACM 54(10), 66–71 (2011)
Article Google Scholar
Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: CIDR, vol. 2011, pp. 148–151 (2011)
Google Scholar
Fan, Q., Li, Y., Zhang, D., Tan, K.-L.: Discovering newsworthy themes from sequenced data: a step towards computational journalism. IEEE Trans. Knowl. Data Eng. 29, 1398–1411 (2017)
Article Google Scholar
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM (JACM) 24(4), 664–675 (1977)
Article MathSciNet Google Scholar
Jiang, X., Li, C., Luo, P., Wang, M., Yu, Y.: Prominent streak discovery in sequence data. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1280–1288. ACM (2011)
Google Scholar
Li, Y., Zou, L., Zhang, H., Zhao, D.: Computing longest increasing subsequences over sequential data streams. Proc. VLDB Endowment 10(3), 181–192 (2016)
Article Google Scholar
Sultana, A., Hassan, N., Li, C., Yang, J., Yu, C.: Incremental discovery of prominent situational facts. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 112–123. IEEE (2014)
Google Scholar
Tang, B., Han, S., Yiu, M.L., Ding, R., Zhang, D.: Extracting top-k insights from multi-dimensional data. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1509–1524. ACM (2017)
Google Scholar
Wu, T., Xin, D., Han, J.: Arcube: supporting ranking aggregate queries in partially materialized data cubes. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 79–92. ACM (2008)
Google Scholar
Wu, T., Xin, D., Mei, Q., Han, J.: Promotion analysis in multi-dimensional space. Proc. VLDB Endowment 2(1), 109–120 (2009)
Article Google Scholar
Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: On one of the few objects. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2012)
Google Scholar
Zhang, G., Jiang, X., Luo, P., Wang, M., Li, C.: Discovering general prominent streaks in sequence data. ACM Trans. Knowl. Discov. Data (TKDD) 8(2), 9 (2014)
Google Scholar

Download references

Acknowledgement

This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284).

Author information

Authors and Affiliations

Shenzhen Key Laboratory of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Wentao Wang & Bo Tang
College of Computer Science, Sichuan University, Chengdu, China
Wentao Wang & Min Zhu

Authors

Wentao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Tang
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Zhu .

Editor information

Editors and Affiliations

South China University of Technology, Guangzhou, China
Yi Cai
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Tang, B., Zhu, M. (2018). Efficient Longest Streak Discovery in Multidimensional Sequence Data. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-96893-3_13
Published: 19 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96892-6
Online ISBN: 978-3-319-96893-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Longest Streak Discovery in Multidimensional Sequence Data

Abstract

Access this chapter

Similar content being viewed by others

Temporal Constraints and Sub-Dimensional Clustering for Fast Similarity Search over Time Series Data. Application to Information Retrieval Tasks.

Efficient discovery of longest-lasting correlation in sequence databases

Multidimensional Longest Increasing Subsequences and Its Variants Discovery Using DNA Operations

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Longest Streak Discovery in Multidimensional Sequence Data

Abstract

Access this chapter

Similar content being viewed by others

Temporal Constraints and Sub-Dimensional Clustering for Fast Similarity Search over Time Series Data. Application to Information Retrieval Tasks.

Efficient discovery of longest-lasting correlation in sequence databases

Multidimensional Longest Increasing Subsequences and Its Variants Discovery Using DNA Operations

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation