Skip to main content

Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems

Part of the Communications in Computer and Information Science book series (CCIS,volume 1255)

Abstract

In order to adapt to the recent phenomenon of exponential growth of time series data sets in both academic and commercial environments, and with the goal of deriving valuable knowledge from this data, a multitude of analysis software tools have been developed to allow groups of collaborating researchers to find and annotate meaningful behavioral patterns. However, these tools commonly lack appropriate mechanisms to handle massive time series data sets of high cardinality, as well as suitable visual encodings for annotated data. In this paper we conduct a comparative study of architectural, persistence and visualization methods that can enable these analysis tools to scale with a continuously-growing data set and handle intense workloads of concurrent traffic. We implement these approaches within a web platform, integrated with authentication, versioning, and locking mechanisms that prevent overlapping contributions or unsanctioned changes. Additionally, we measure the performance of a set of databases when writing and reading varying amounts of series data points, as well as the performance of different architectural models at scale.

Keywords

  • Time series
  • Annotations
  • Annotation systems
  • Collaborative software
  • Data analysis
  • Information science
  • Data modeling
  • Knowledge management
  • Database management systems
  • Time series databases distributed systems
  • Software architecture
  • Information visualization

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html.

  2. 2.

    https://www.ibm.com/analytics/information-server.

  3. 3.

    https://www.postgresql.org/.

  4. 4.

    https://www.timescale.com/.

  5. 5.

    https://www.influxdata.com/.

  6. 6.

    http://cassandra.apache.org/.

  7. 7.

    http://druid.io/.

  8. 8.

    http://opentsdb.net/.

  9. 9.

    https://www.elastic.co/products/elasticsearch.

  10. 10.

    https://www.monetdb.org/.

  11. 11.

    https://prometheus.io/.

  12. 12.

    https://hbase.apache.org/.

  13. 13.

    http://blueflood.io/.

  14. 14.

    https://kairosdb.github.io/.

  15. 15.

    https://github.com/deanhiller/databus.

  16. 16.

    https://www.mysql.com/.

  17. 17.

    https://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007.

  18. 18.

    https://d3js.org/.

  19. 19.

    https://plot.ly/javascript/.

  20. 20.

    http://dygraphs.com/.

  21. 21.

    https://www.khronos.org/webgl/.

  22. 22.

    https://threejs.org/.

  23. 23.

    https://blogs.adobe.com/conversations/2011/11/flash-focus.html.

  24. 24.

    https://www.elastic.co/blog/timelion-timeline.

  25. 25.

    https://grafana.com/.

  26. 26.

    http://freeboard.io/.

  27. 27.

    https://www.oracle.com/technetwork/java/javase/overview/java8-2100321.html.

  28. 28.

    https://spring.io/projects/spring-boot.

  29. 29.

    https://reactjs.org/.

  30. 30.

    https://www.typescriptlang.org/.

  31. 31.

    http://hibernate.org/.

  32. 32.

    http://lucene.apache.org/solr/.

  33. 33.

    https://hibernate.org/orm/envers/.

  34. 34.

    https://docs.docker.com/engine/swarm/.

  35. 35.

    http://www.rabbitmq.com.

  36. 36.

    https://redis.io/.

  37. 37.

    https://spring.io/projects/spring-data-redis.

  38. 38.

    https://github.com/jwtk/jjwt.

  39. 39.

    https://github.com/FasterXML/jackson.

  40. 40.

    http://spring.io/projects/spring-data-jpa.

  41. 41.

    https://github.com/influxdata/influxdb-java.

  42. 42.

    https://ant.design.

  43. 43.

    https://redux.js.org/.

  44. 44.

    https://github.com/axios/axios.

  45. 45.

    http://dygraphs.com/.

  46. 46.

    https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API.

  47. 47.

    https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction.

References

  1. Abadi, D.: Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 45(2), 37–42 (2012)

    CrossRef  Google Scholar 

  2. Bader, A., Kopp, O., Falkenthal, M.: Survey and comparison of open source time series databases. In: Mitschang, B., et al. (eds.) Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, pp. 249–268. Gesellschaft für Informatik e.V, Bonn (2017)

    Google Scholar 

  3. Bar-Or, A., Healey, J., Kontothanassis, L., Thong, J.M.V.: Biostream: a system architecture for real-time processing of physiological signals. In: The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 3101–3104, September 2004. https://doi.org/10.1109/IEMBS.2004.1403876

  4. Bhardwaj, A., et al.: Datahub: Collaborative data science & dataset version management at scale. arXiv preprint arXiv:1409.0798 (2014)

  5. Bhattacherjee, S., Chavan, A., Huang, S., Deshpande, A., Parameswaran, A.G.: Principles of dataset versioning: exploring the recreation/storage tradeoff. CoRR abs/1505.05211 (2015). http://arxiv.org/abs/1505.05211

  6. Blount, M., et al.: Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng. Med. Biol. Mag. 29(2), 110–118 (2010). https://doi.org/10.1109/MEMB.2010.936454

    CrossRef  Google Scholar 

  7. Duarte, E., Gomes, D., Campos, D., Aguiar, R.L.: Distributed and scalable platform for collaborative analysis of massive time series data sets. In: Proceedings of the 8th International Conference on Data Science, Technology and Applications - Volume 1: DATA, pp. 41–52. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007834700410052

  8. Ellis, G., Dix, A.: A taxonomy of clutter reduction for information visualisation. IEEE Trans. Visual Comput. Graphics 13(6), 1216–1223 (2007). https://doi.org/10.1109/TVCG.2007.70535

    CrossRef  Google Scholar 

  9. Eltabakh, M.Y., Aref, W.G., Elmagarmid, A.K., Ouzzani, M., Silva, Y.N.: Supporting annotations on relations. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 379–390. ACM, New York (2009). https://doi.org/10.1145/1516360.1516405, http://doi.acm.org/10.1145/1516360.1516405

  10. Fielding, R.: Representational state transfer. In: Architectural Styles and the Design of Netowork-based Software Architecture, pp. 76–85 (2000)

    Google Scholar 

  11. Fowler, M.: Event sourcing. Online, Dec p. 18 (2005)

    Google Scholar 

  12. Freedman, M.: Timescaledb vs. influxdb: purpose built differently for time-series data (2019). https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/

  13. Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002). https://doi.org/10.1145/564585.564601, http://doi.acm.org/10.1145/564585.564601

  14. Goldschmidt, T., Jansen, A., Koziolek, H., Doppelhamer, J., Breivold, H.P.: Scalability and robustness of time-series databases for cloud-native monitoring of industrial processes. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 602–609, June 2014. https://doi.org/10.1109/CLOUD.2014.86

  15. Guyet, T., Garbay, C., Dojat, M.: Knowledge construction from time series data using a collaborative exploration system. J. Biomed. Inf. 40(6), 672–687 (2007). https://doi.org/10.1016/j.jbi.2007.09.006, http://www.sciencedirect.com/science/article/pii/S1532046407001050, intelligent Data Analysis in Biomedicine

  16. Hadavandi, E., Shavandi, H., Ghanbari, A.: Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl.-Based Syst. 23(8), 800–808 (2010). https://doi.org/10.1016/j.knosys.2010.05.004, http://www.sciencedirect.com/science/article/pii/S0950705110000857

  17. Hampton, L.: Eye or the tiger: benchmarking cassandra vs. timescaledb for time-series data (2018). https://blog.timescale.com/blog/time-series-data-cassandra-vs-timescaledb-postgresql-7c2cc50a89ce/

  18. Harger, J.R., Crossno, P.J.: Comparison of open-source visual analytics toolkits, vol. 8294, pp. 8294–8294 - 10 (2012). https://doi.org/10.1117/12.911901, http://dx.doi.org/10.1117/12.911901

  19. Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Web-based remote monitoring of live EEG. In: The 12th IEEE International Conference on e-Health Networking, Applications and Services, pp. 169–174, July 2010. https://doi.org/10.1109/HEALTH.2010.5556574

  20. Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Interactive annotations to support collaborative analysis of streaming physiological data. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–5, June 2011. https://doi.org/10.1109/CBMS.2011.5999131

  21. Huang, S., Xu, L., Liu, J., Elmore, A.J., Parameswaran, A.G.: Orpheusdb: bolt-on versioning for relational databases. PVLDB 10(10), 1130–1141 (2017). http://www.vldb.org/pvldb/vol10/p1130-huang.pdf

  22. Jensen, S.K., Pedersen, T.B., Thomsen, C.: Time series management systems: a survey. IEEE Trans. Knowl. Data Eng. 29(11), 2581–2600 (2017). https://doi.org/10.1109/TKDE.2017.2740932

    CrossRef  Google Scholar 

  23. Kalakanti, A.K., Sudhakaran, V., Raveendran, V., Menon, N.: A comprehensive evaluation of NOSQL datastores in the context of historians and sensor data analysis. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1797–1806, October 2015. https://doi.org/10.1109/BigData.2015.7363952

  24. Kalogeropoulos, D.A., Carson, E.R., Collinson, P.O.: Towards knowledge-based systems in clinical practice: Development of an integrated clinical information and knowledge management support system. Comput. Methods Programs Biomed. 72(1), 65–80 (2003). https://doi.org/10.1016/S0169-2607(02)00118-9, http://www.sciencedirect.com/science/article/pii/S0169260702001189

  25. Kamburugamuve, S., Wickramasinghe, P., Ekanayake, S., Wimalasena, C., Pathirage, M., Fox, G.C.: Tsmap3d: browser visualization of high dimensional time series data. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3583–3592 (2016)

    Google Scholar 

  26. Keraron, Y., Bernard, A., Bachimont, B.: Annotations to improve the using and the updating of digital technical publications. Res. Eng. Design 20, 157–170 (2009)

    CrossRef  Google Scholar 

  27. Kiefer, R.: Timescaledb vs. postgresql for time-series: 20x higher inserts, 2000x faster deletes, 1.2x-14,000x faster queries (2017). https://blog.timescale.com/blog/timescaledb-vs-6a696248104e/

  28. Kreps, J.: The log: what every software engineer should know about real-time data’s unifying abstraction (2013). https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

  29. Mathe, Z., Haen, C., Stagni, F.: Monitoring performance of a highly distributed and complex computing infrastructure in LHCB. In: Journal of Physics: Conference Series, vol. 898, p. 092028. IOP Publishing (2017)

    Google Scholar 

  30. Momjian, B.: Mvcc unmasked (2018). https://momjian.us/main/writings/pgsql/mvcc.pdf

  31. O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). Acta Informatica 33(4), 351–385 (1996)

    CrossRef  Google Scholar 

  32. O’Reilly, R.D.: A distributed architecture for the monitoring and analysis of time series data (2015)

    Google Scholar 

  33. Pressly, Jr., W.B.S.: TSPAD: a tablet-pc based application for annotation and collaboration on time series data. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, ACM-SE 46, pp. 527–528. ACM, New York (2008). https://doi.org/10.1145/1593105.1593249, http://doi.acm.org/10.1145/1593105.1593249

  34. Provos, N., Mazieres, D.: A future-adaptable password scheme (1999)

    Google Scholar 

  35. Pungilă, C., Fortiş, T.F., Aritoni, O.: Benchmarking database systems for the requirements of sensor readings. IETE Tech. Rev. 26(5), 342–349 (2009). https://doi.org/10.4103/0256-4602.55279, http://www.tandfonline.com/doi/abs/10.4103/0256-4602.55279

  36. van Renesse, R., Schneider, F.B.: Chain replication for supporting high throughput and availability. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI 2004, p.7. USENIX Association, Berkeley(2004). http://dl.acm.org/citation.cfm?id=1251254.1251261

  37. Snodgrass, R.T.: Temporal databases. In: Frank, A.U., Campari, I., Formentini, U. (eds.) GIS 1992. LNCS, vol. 639, pp. 22–64. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55966-3_2

    CrossRef  Google Scholar 

  38. Sow, D., Biem, A., Blount, M., Ebling, M., Verscheure, O.: Body sensor data processing using stream computing. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR 2010, pp. 449–458, ACM, New York (2010). https://doi.org/10.1145/1743384.1743465, http://doi.acm.org/10.1145/1743384.1743465

Download references

Acknowledgements

The present study was developed in the scope of the Smart Green Homes Project [POCI-01-0247-FEDER-007678], a co-promotion between Bosch Termotecnologia S.A. and the University of Aveiro. It is financed by Portugal 2020 under the Competitiveness and Internationalization Operational Program, and by the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo Duarte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duarte, E., Gomes, D., Campos, D., Aguiar, R.L. (2020). Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2019. Communications in Computer and Information Science, vol 1255. Springer, Cham. https://doi.org/10.1007/978-3-030-54595-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54595-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54594-9

  • Online ISBN: 978-3-030-54595-6

  • eBook Packages: Computer ScienceComputer Science (R0)