Abstract
In order to derive reliable insights or make evidence-based decisions, the starting point is to assess and meet a minimum quality of data, either by those that publish the data (preferably) or alternatively by those that prepare data for analysis and develop specific analytics. Much of the (open) data shared by governments and different institutions, or crowdsourced, is in tabular format, and the amount and size of it is increasing rapidly. This paper presents the challenges faced and the solutions adopted while evolving the web-based graphical user interface (GUI) of a tabular data preparation tool from in-memory fitting to Big Data sets. Traditional standalone processing and rendering solutions are no longer usable in a Big Data context. We report on the approach adopted to asynchronously pre-compute the visualisations required for the tool, in addition to the applied visualisation aggregation strategies. The implementation of this approach has allowed us to overcome web-browsers’ client-side data handling limitations and to avoid information overload when using granular information charts from our existing in-memory data preparation tool with Big Data sets. The developed solution provides the user with an acceptable GUI interaction time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bikakis, N.: Big data visualization tools. In: Sakr, S., Zomaya, A.Y., (eds.) Encyclopedia of Big Data Technologies. pp. 336–340. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-77525-8_109
Battle, L., Stonebraker, M., Chang, R.: Dynamic reduction of query result sets for interactive visualizaton. In: 2013 IEEE International Conference on Big Data, pp. 1–8 (2013). https://doi.org/10.1109/BigData.2013.6691708
Park, Y., Cafarella, M., Mozafari, B.: Visualization-aware sampling for very large databases. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 755–766 (2016). https://doi.org/10.1109/ICDE.2016.7498287
Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: VDDA: automatic visualization-driven data aggregation in relational databases. VLDB J. 25(1), 53–77 (2015). https://doi.org/10.1007/s00778-015-0396-z
Lins, L., Klosowski, J.T., Scheidegger, C.: Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans. Vis. Comput. Graph. 19, 2456–2465 (2013). https://doi.org/10.1109/TVCG.2013.179
Bikakis, N., Papastefanatos, G., Skourla, M., Sellis, T.: A hierarchical aggregation framework for efficient multilevel visual exploration and analysis. Semantic Web. 8, 139–179 (2017). https://doi.org/10.3233/SW-160226
Elmqvist, N., Fekete, J.-D.: Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans. Vis. Comput. Graph. 16, 439–454 (2010). https://doi.org/10.1109/TVCG.2009.84
Stolper, C.D., Perer, A., Gotz, D.: Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Trans. Vis. Comput. Graph. 20, 1653–1662 (2014). https://doi.org/10.1109/TVCG.2014.2346574
Im, J.-F., Villegas, F.G., McGuffin, M.J.: VisReduce: Fast and responsive incremental information visualization of large datasets. In: 2013 IEEE International Conference on Big Data, pp. 25–32 (2013). https://doi.org/10.1109/BigData.2013.6691710
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1555–1566. Association for Computing Machinery, Snowbird, Utah (2014). https://doi.org/10.1145/2588555.2610498
Mackinlay, J., Hanrahan, P., Stolte, C.: Show me: automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph. 13, 1137–1144 (2007). https://doi.org/10.1109/TVCG.2007.70594
Gotz, D., Wen, Z.: Behavior-driven visualization recommendation. In: Proceedings of the 14th international conference on Intelligent user interfaces. pp. 315–324. Association for Computing Machinery, Sanibel Island, Florida, USA (2009). https://doi.org/10.1145/1502650.1502695
Ali, S.M., Gupta, N., Nayak, G.K., Lenka, R.K.: Big data visualization: tools and challenges. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp. 656–660 (2016). https://doi.org/10.1109/IC3I.2016.7918044
Ovum: Ovum Decision Matrix: Selecting a Self-Service Data Prep Solution, 2018–19. (2018)
Álvarez Sánchez, R., Beristain Iraola, A., Epelde Unanue, G., Carlin, P.: TAQIH, a tool for tabular data quality assessment and improvement in the context of health data. Comput. Methods Programs Biomed. SI Data Qual. Assess. 181, 104824 (2019). https://doi.org/10.1016/j.cmpb.2018.12.029
The Dama UK Working Group: The Six Primary Dimensions For Data Quality assessment, https://www.dqglobal.com/wp-content/uploads/2013/11/DAMA-UK-DQ-Dimensions-White-Paper-R37.pdf. Accessed 08 Mar 2018
Nielsen, J.: Usability Engineering. Morgan Kaufmann, Amsterdam (1993)
Acknowledgments
This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 727721 (MIDAS).
This work was supported by the Gipuzkoan Science, Technology and Innovation Network Programme funding of the HIDRA project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Epelde, G., Álvarez, R., Beristain, A., Arrúe, M., Arangoa, I., Rankin, D. (2020). Enhancing the Interactive Visualisation of a Data Preparation Tool from in-Memory Fitting to Big Data Sets. In: Abramowicz, W., Klein, G. (eds) Business Information Systems Workshops. BIS 2020. Lecture Notes in Business Information Processing, vol 394. Springer, Cham. https://doi.org/10.1007/978-3-030-61146-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-61146-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61145-3
Online ISBN: 978-3-030-61146-0
eBook Packages: Computer ScienceComputer Science (R0)