Abstract
In order to adapt to the recent phenomenon of exponential growth of time series data sets in both academic and commercial environments, and with the goal of deriving valuable knowledge from this data, a multitude of analysis software tools have been developed to allow groups of collaborating researchers to find and annotate meaningful behavioral patterns. However, these tools commonly lack appropriate mechanisms to handle massive time series data sets of high cardinality, as well as suitable visual encodings for annotated data. In this paper we conduct a comparative study of architectural, persistence and visualization methods that can enable these analysis tools to scale with a continuously-growing data set and handle intense workloads of concurrent traffic. We implement these approaches within a web platform, integrated with authentication, versioning, and locking mechanisms that prevent overlapping contributions or unsanctioned changes. Additionally, we measure the performance of a set of databases when writing and reading varying amounts of series data points, as well as the performance of different architectural models at scale.
Keywords
- Time series
- Annotations
- Annotation systems
- Collaborative software
- Data analysis
- Information science
- Data modeling
- Knowledge management
- Database management systems
- Time series databases distributed systems
- Software architecture
- Information visualization
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
References
Abadi, D.: Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 45(2), 37–42 (2012)
Bader, A., Kopp, O., Falkenthal, M.: Survey and comparison of open source time series databases. In: Mitschang, B., et al. (eds.) Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, pp. 249–268. Gesellschaft für Informatik e.V, Bonn (2017)
Bar-Or, A., Healey, J., Kontothanassis, L., Thong, J.M.V.: Biostream: a system architecture for real-time processing of physiological signals. In: The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 3101–3104, September 2004. https://doi.org/10.1109/IEMBS.2004.1403876
Bhardwaj, A., et al.: Datahub: Collaborative data science & dataset version management at scale. arXiv preprint arXiv:1409.0798 (2014)
Bhattacherjee, S., Chavan, A., Huang, S., Deshpande, A., Parameswaran, A.G.: Principles of dataset versioning: exploring the recreation/storage tradeoff. CoRR abs/1505.05211 (2015). http://arxiv.org/abs/1505.05211
Blount, M., et al.: Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng. Med. Biol. Mag. 29(2), 110–118 (2010). https://doi.org/10.1109/MEMB.2010.936454
Duarte, E., Gomes, D., Campos, D., Aguiar, R.L.: Distributed and scalable platform for collaborative analysis of massive time series data sets. In: Proceedings of the 8th International Conference on Data Science, Technology and Applications - Volume 1: DATA, pp. 41–52. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007834700410052
Ellis, G., Dix, A.: A taxonomy of clutter reduction for information visualisation. IEEE Trans. Visual Comput. Graphics 13(6), 1216–1223 (2007). https://doi.org/10.1109/TVCG.2007.70535
Eltabakh, M.Y., Aref, W.G., Elmagarmid, A.K., Ouzzani, M., Silva, Y.N.: Supporting annotations on relations. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 379–390. ACM, New York (2009). https://doi.org/10.1145/1516360.1516405, http://doi.acm.org/10.1145/1516360.1516405
Fielding, R.: Representational state transfer. In: Architectural Styles and the Design of Netowork-based Software Architecture, pp. 76–85 (2000)
Fowler, M.: Event sourcing. Online, Dec p. 18 (2005)
Freedman, M.: Timescaledb vs. influxdb: purpose built differently for time-series data (2019). https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002). https://doi.org/10.1145/564585.564601, http://doi.acm.org/10.1145/564585.564601
Goldschmidt, T., Jansen, A., Koziolek, H., Doppelhamer, J., Breivold, H.P.: Scalability and robustness of time-series databases for cloud-native monitoring of industrial processes. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 602–609, June 2014. https://doi.org/10.1109/CLOUD.2014.86
Guyet, T., Garbay, C., Dojat, M.: Knowledge construction from time series data using a collaborative exploration system. J. Biomed. Inf. 40(6), 672–687 (2007). https://doi.org/10.1016/j.jbi.2007.09.006, http://www.sciencedirect.com/science/article/pii/S1532046407001050, intelligent Data Analysis in Biomedicine
Hadavandi, E., Shavandi, H., Ghanbari, A.: Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl.-Based Syst. 23(8), 800–808 (2010). https://doi.org/10.1016/j.knosys.2010.05.004, http://www.sciencedirect.com/science/article/pii/S0950705110000857
Hampton, L.: Eye or the tiger: benchmarking cassandra vs. timescaledb for time-series data (2018). https://blog.timescale.com/blog/time-series-data-cassandra-vs-timescaledb-postgresql-7c2cc50a89ce/
Harger, J.R., Crossno, P.J.: Comparison of open-source visual analytics toolkits, vol. 8294, pp. 8294–8294 - 10 (2012). https://doi.org/10.1117/12.911901, http://dx.doi.org/10.1117/12.911901
Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Web-based remote monitoring of live EEG. In: The 12th IEEE International Conference on e-Health Networking, Applications and Services, pp. 169–174, July 2010. https://doi.org/10.1109/HEALTH.2010.5556574
Healy, P.D., O’Reilly, R.D., Boylan, G.B., Morrison, J.P.: Interactive annotations to support collaborative analysis of streaming physiological data. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–5, June 2011. https://doi.org/10.1109/CBMS.2011.5999131
Huang, S., Xu, L., Liu, J., Elmore, A.J., Parameswaran, A.G.: Orpheusdb: bolt-on versioning for relational databases. PVLDB 10(10), 1130–1141 (2017). http://www.vldb.org/pvldb/vol10/p1130-huang.pdf
Jensen, S.K., Pedersen, T.B., Thomsen, C.: Time series management systems: a survey. IEEE Trans. Knowl. Data Eng. 29(11), 2581–2600 (2017). https://doi.org/10.1109/TKDE.2017.2740932
Kalakanti, A.K., Sudhakaran, V., Raveendran, V., Menon, N.: A comprehensive evaluation of NOSQL datastores in the context of historians and sensor data analysis. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1797–1806, October 2015. https://doi.org/10.1109/BigData.2015.7363952
Kalogeropoulos, D.A., Carson, E.R., Collinson, P.O.: Towards knowledge-based systems in clinical practice: Development of an integrated clinical information and knowledge management support system. Comput. Methods Programs Biomed. 72(1), 65–80 (2003). https://doi.org/10.1016/S0169-2607(02)00118-9, http://www.sciencedirect.com/science/article/pii/S0169260702001189
Kamburugamuve, S., Wickramasinghe, P., Ekanayake, S., Wimalasena, C., Pathirage, M., Fox, G.C.: Tsmap3d: browser visualization of high dimensional time series data. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3583–3592 (2016)
Keraron, Y., Bernard, A., Bachimont, B.: Annotations to improve the using and the updating of digital technical publications. Res. Eng. Design 20, 157–170 (2009)
Kiefer, R.: Timescaledb vs. postgresql for time-series: 20x higher inserts, 2000x faster deletes, 1.2x-14,000x faster queries (2017). https://blog.timescale.com/blog/timescaledb-vs-6a696248104e/
Kreps, J.: The log: what every software engineer should know about real-time data’s unifying abstraction (2013). https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Mathe, Z., Haen, C., Stagni, F.: Monitoring performance of a highly distributed and complex computing infrastructure in LHCB. In: Journal of Physics: Conference Series, vol. 898, p. 092028. IOP Publishing (2017)
Momjian, B.: Mvcc unmasked (2018). https://momjian.us/main/writings/pgsql/mvcc.pdf
O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). Acta Informatica 33(4), 351–385 (1996)
O’Reilly, R.D.: A distributed architecture for the monitoring and analysis of time series data (2015)
Pressly, Jr., W.B.S.: TSPAD: a tablet-pc based application for annotation and collaboration on time series data. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, ACM-SE 46, pp. 527–528. ACM, New York (2008). https://doi.org/10.1145/1593105.1593249, http://doi.acm.org/10.1145/1593105.1593249
Provos, N., Mazieres, D.: A future-adaptable password scheme (1999)
Pungilă, C., Fortiş, T.F., Aritoni, O.: Benchmarking database systems for the requirements of sensor readings. IETE Tech. Rev. 26(5), 342–349 (2009). https://doi.org/10.4103/0256-4602.55279, http://www.tandfonline.com/doi/abs/10.4103/0256-4602.55279
van Renesse, R., Schneider, F.B.: Chain replication for supporting high throughput and availability. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI 2004, p.7. USENIX Association, Berkeley(2004). http://dl.acm.org/citation.cfm?id=1251254.1251261
Snodgrass, R.T.: Temporal databases. In: Frank, A.U., Campari, I., Formentini, U. (eds.) GIS 1992. LNCS, vol. 639, pp. 22–64. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55966-3_2
Sow, D., Biem, A., Blount, M., Ebling, M., Verscheure, O.: Body sensor data processing using stream computing. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR 2010, pp. 449–458, ACM, New York (2010). https://doi.org/10.1145/1743384.1743465, http://doi.acm.org/10.1145/1743384.1743465
Acknowledgements
The present study was developed in the scope of the Smart Green Homes Project [POCI-01-0247-FEDER-007678], a co-promotion between Bosch Termotecnologia S.A. and the University of Aveiro. It is financed by Portugal 2020 under the Competitiveness and Internationalization Operational Program, and by the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, E., Gomes, D., Campos, D., Aguiar, R.L. (2020). Scalable Architecture, Storage and Visualization Approaches for Time Series Analysis Systems. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2019. Communications in Computer and Information Science, vol 1255. Springer, Cham. https://doi.org/10.1007/978-3-030-54595-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-54595-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54594-9
Online ISBN: 978-3-030-54595-6
eBook Packages: Computer ScienceComputer Science (R0)