Skip to main content

Efficient Longest Streak Discovery in Multidimensional Sequence Data

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Abstract

This paper studies the problem of discovering longest streak in multidimensional sequence dataset. Given a multidimensional sequence dataset, the contextual longest streak is the longest consecutive tuples in a context subspace which match with a specific measure constraint. It has various applications in social network analysis, computational journalism, etc. The challenges of the longest streak discovery problem are (i) huge search space, and (ii) non-monotonicity property of streak lengths. In this paper, we propose a novel computation framework with a suite of optimization techniques for it. Our solutions outperform the baseline solution by two orders of magnitude in both real and synthetic datasets. In addition, we validate the effectiveness of our proposal by a real-world case study.

B. Tang—is co-first author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://bleacherreport.com/articles/2697055-golden-state-warriors-100-point-streak-at-home-ends-at-56-games.

  2. 2.

    https://cavsnation.com/cavs-news-clevelands-16-game-streak-of-100-points-ties-longest-in-franchise-history/.

  3. 3.

    https://en.wikipedia.org/wiki/Double_(basketball).

  4. 4.

    https://www.basketball-reference.com/leagues/.

  5. 5.

    https://www.metoffice.gov.uk/datapoint/product/uk-3hourly-site-specific-forecast.

  6. 6.

    http://www.cleveland.com/ohio-sports-blog/index.ssf/2011/03/kevin_loves_double-double_stre.html.

  7. 7.

    https://www.cbssports.com/nba/news/russell-westbrooks-jordan-esque-triple-double-streak-ends-at-7/.

References

  1. Aldous, D., Diaconis, P.: Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bull. Am. Math. Soc. 36(4), 413–432 (1999)

    Article  MathSciNet  Google Scholar 

  2. Cohen, S., Hamilton, J.T., Turner, F.: Computational journalism. Commun. ACM 54(10), 66–71 (2011)

    Article  Google Scholar 

  3. Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: CIDR, vol. 2011, pp. 148–151 (2011)

    Google Scholar 

  4. Fan, Q., Li, Y., Zhang, D., Tan, K.-L.: Discovering newsworthy themes from sequenced data: a step towards computational journalism. IEEE Trans. Knowl. Data Eng. 29, 1398–1411 (2017)

    Article  Google Scholar 

  5. Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM (JACM) 24(4), 664–675 (1977)

    Article  MathSciNet  Google Scholar 

  6. Jiang, X., Li, C., Luo, P., Wang, M., Yu, Y.: Prominent streak discovery in sequence data. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1280–1288. ACM (2011)

    Google Scholar 

  7. Li, Y., Zou, L., Zhang, H., Zhao, D.: Computing longest increasing subsequences over sequential data streams. Proc. VLDB Endowment 10(3), 181–192 (2016)

    Article  Google Scholar 

  8. Sultana, A., Hassan, N., Li, C., Yang, J., Yu, C.: Incremental discovery of prominent situational facts. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 112–123. IEEE (2014)

    Google Scholar 

  9. Tang, B., Han, S., Yiu, M.L., Ding, R., Zhang, D.: Extracting top-k insights from multi-dimensional data. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1509–1524. ACM (2017)

    Google Scholar 

  10. Wu, T., Xin, D., Han, J.: Arcube: supporting ranking aggregate queries in partially materialized data cubes. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 79–92. ACM (2008)

    Google Scholar 

  11. Wu, T., Xin, D., Mei, Q., Han, J.: Promotion analysis in multi-dimensional space. Proc. VLDB Endowment 2(1), 109–120 (2009)

    Article  Google Scholar 

  12. Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: On one of the few objects. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2012)

    Google Scholar 

  13. Zhang, G., Jiang, X., Luo, P., Wang, M., Li, C.: Discovering general prominent streaks in sequence data. ACM Trans. Knowl. Discov. Data (TKDD) 8(2), 9 (2014)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, W., Tang, B., Zhu, M. (2018). Efficient Longest Streak Discovery in Multidimensional Sequence Data. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96893-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96892-6

  • Online ISBN: 978-3-319-96893-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics