Advertisement

Distributed In-Memory Analytics for Big Temporal Data

  • Bin Yao
  • Wei Zhang
  • Zhi-Jie Wang
  • Zhongpu Chen
  • Shuo Shang
  • Kai Zheng
  • Minyi Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

The temporal data is ubiquitous, and massive amount of temporal data is generated nowadays. Management of big temporal data is important yet challenging. Processing big temporal data using a distributed system is a desired choice. However, existing distributed systems/methods either cannot support native queries, or are disk-based solutions, which could not well satisfy the requirements of high throughput and low latency. To alleviate this issue, this paper proposes an In-memory based Two-level Index Solution in Spark (ITISS) for processing big temporal data. The framework of our system is easy to understand and implement, but without loss of efficiency. We conduct extensive experiments to verify the performance of our solution. Experimental results based on both real and synthetic datasets consistently demonstrate that our solution is efficient and competitive.

Keywords

Big temporal data Distributed in-memory analytics Apache Spark Temporal queries 

Notes

Acknowledgments

This work was supported by the National Basic Research Program (973 Program, No. 2015CB352403), the NSFC (U1636210, 61729202, 91438121, 61672351, 61472453, U1401256, U1501252, U1611264, U1711261 and U1711262), the National Key Research and Development Program of China (2016YFB0700502), the Scientific Innovation Act of STCSM (15JC1402400), the Opening Projects of Guangdong Key Laboratory of Big Data Analysis and Processing (201808), Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University (SZU-GDPHPCL2017), and the Microsoft Research Asia.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Ahn, I., Snodgrass, R.: Performance evaluation of a temporal database management system. In: SIGMOD (1986)CrossRefGoogle Scholar
  5. 5.
    Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, B.: An asymptotically optimal multiversion B-tree. VLDBJ (1996)CrossRefGoogle Scholar
  6. 6.
    Bettini, C., Wang, X.S., Bertino, E., Jajodia, S.: Semantic assumptions and query evaluation in temporal databases. In: SIGMOD (1995)Google Scholar
  7. 7.
    Bliujute, R., Jensen, C.S., Saltenis, S., Slivinskas, G.: R-tree based indexing of now-relative bitemporal data. In: VLDB (1998)Google Scholar
  8. 8.
    Böhlen, M., Gamper, J., Jensen, C.S.: Multi-dimensional aggregation for temporal data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 257–275. Springer, Heidelberg (2006).  https://doi.org/10.1007/11687238_18CrossRefGoogle Scholar
  9. 9.
    Chandramouli, B., Goldstein, J., Duan, S.: Temporal analytics on big data for web advertising. In: ICDE (2012)Google Scholar
  10. 10.
    Cheng, K.: On computing temporal aggregates over null time intervals. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 67–79. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64471-4_7CrossRefGoogle Scholar
  11. 11.
    Elmasri, R., Wuu, G.T., Kim, Y.J.: The time index: an access structure for temporal data. In: VLDB (1990)Google Scholar
  12. 12.
    Färber, F., et al.: The SAP HANA database-an architecture overview. IEEE Data Eng. Bull. (2012)Google Scholar
  13. 13.
    Gao, D., Jensen, S., Snodgrass, R.T., Soo, D.: Join operations in temporal databases. VLDBJ (2005)CrossRefGoogle Scholar
  14. 14.
    Gendrano, J.A.G., Huang, B.C., Rodrigue, J.M., Moon, B., Snodgrass, R.T., Parallel algorithms for computing temporal aggregates. In: ICDE (1999)Google Scholar
  15. 15.
    Gollapudi, S., Sivakumar, D.: Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM (2004)Google Scholar
  16. 16.
    Günnemann, S., Kremer, H., Laufkötter, C., Seidl, T.: Tracing evolving subspace clusters in temporal climate data. DMKD 24, 387–410 (2012)MathSciNetGoogle Scholar
  17. 17.
    Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. TKDE (2014)CrossRefGoogle Scholar
  18. 18.
    Jensen, C.S., Snodgrass, R.T.: Temporal data management. TKDE (1999)CrossRefGoogle Scholar
  19. 19.
    Kaufmann, M., Fischer, P.M., May, N., Ge, C., Goel, A.K., Kossmann, D.: Bi-temporal timeline index: a data structure for processing queries on bi-temporal data. In: ICDE (2015)Google Scholar
  20. 20.
    Kaufmann, M., Manjili, A.A., Vagenas, P., Fischer, P.M., Kossmann, D., Färber, F., May, N.: Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In: SIGMOD (2013)Google Scholar
  21. 21.
    Kline, N., Snodgrass, R.T.: Computing temporal aggregates. In: ICDE (1995)Google Scholar
  22. 22.
    Kollios, G., Tsotras, V.J.: Hashing methods for temporal data. TKDE (2002)CrossRefGoogle Scholar
  23. 23.
    Le, W., Li, F., Tao, Y., Christensen, R.: Optimal splitters for temporal and multi-version databases. In: SIGMOD (2013)Google Scholar
  24. 24.
    Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection (2014). http://snap.stanford.edu/data
  25. 25.
    Leung, T.C., Muntz, R.R.: Temporal query processing and optimization in multiprocessor database machines. In: VLDB (1992)Google Scholar
  26. 26.
    Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDBJ (2010)CrossRefGoogle Scholar
  27. 27.
    Loglisci, C., Ceci, M., Malerba, D.: A temporal data mining framework for analyzing longitudinal data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 97–106. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23091-2_9CrossRefGoogle Scholar
  28. 28.
    Lomet, D., et al.: Transaction time support inside a database engine. In: ICDE (2006)Google Scholar
  29. 29.
    Ramaswamy, S.: Efficient indexing for constraint and temporal databases. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 419–431. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-62222-5_61CrossRefGoogle Scholar
  30. 30.
    Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. TKDE (2002)Google Scholar
  31. 31.
    Saracco, C.M., et al.: A matter of time: temporal data management in DB2 10. Technical report, IBM (2012)Google Scholar
  32. 32.
    Wang, P., Zhang, P., Zhou, C., Li, Z., Yang, H.: Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data. DMKD 31, 32–64 (2017)MathSciNetGoogle Scholar
  33. 33.
    Wang, X.S., Jajodia, S., Subrahmanian, V.: Temporal modules: an approach toward federated temporal databases. In: SIGMOD (1993)Google Scholar
  34. 34.
    Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD (2016)Google Scholar
  35. 35.
    Yang, J., Widom, J.: Incremental computation and maintenance of temporal aggregates. In: ICDE (2001)Google Scholar
  36. 36.
    Yang, Y., Chen, K.: Temporal data clustering via weighted clustering ensemble with different representations. TKDE (2011)Google Scholar
  37. 37.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)Google Scholar
  38. 38.
    Zhang, D., Markowetz, A., Tsotras, V.J., Gunopulos, D., Seeger, B.: On computing temporal aggregates with range predicates. TODS (2008)Google Scholar
  39. 39.
    Zhang, S., Yang, Y., Fan, W., Lan, L., Yuan, M.: OceanRT: real-time analytics over large temporal data. In: SIGMOD (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Guangdong Province Key Laboratory of Big Data Analysis and ProcessingGuangzhouChina
  3. 3.Guangdong Province Key Laboratory of Popular High Performance ComputersGuangzhouChina
  4. 4.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  5. 5.Extreme Computing Research CenterKing Abdullah University of Science and TechnologyMeccaSaudi Arabia
  6. 6.School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations