The VLDB Journal

, Volume 25, Issue 1, pp 53–77 | Cite as

VDDA: automatic visualization-driven data aggregation in relational databases

  • Uwe Jugel
  • Zbigniew Jerzak
  • Gregor Hackenbroich
  • Volker Markl
Special Issue Paper

Abstract

Contemporary RDBMS-based systems for visualization of high-volume numerical data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of large data sets disregard the spatial properties of visualizations, resulting in visualization errors. In this work, we introduce VDDA, a visualization-driven data aggregation that models visual aggregation at the pixel level as data aggregation at the query level. Based on the M4 aggregation for producing pixel-perfect line charts from highly reduced data subsets, we define a complete set of data reduction operators that simulate the overplotting behavior of the most frequently used chart types. Relying only on the relational algebra and the common data aggregation functions, our approach is generic and applicable to any visualization system that consumes data stored in relational databases. We demonstrate our visualization-driven data aggregation using real-world data sets from high-tech manufacturing, stock markets, and sports analytics, reducing data volumes by up to two orders of magnitude, while preserving pixel-perfect visualizations, as producible from the raw data.

Keywords

Relational databases Data aggregation Visual aggregation Dimensionality reduction Data visualization  Line rasterization Overplotting 

References

  1. 1.
    Agarwal, S., Panda, A., Mozafari, B., Iyer, A.P., Madden, S., Stoica, I.: Blink and it’s done: Interactive queries on very large data. PVLDB 5(12), 1902–1905 (2012)Google Scholar
  2. 2.
    Aigner, W., Miksch, S., Schumann, H., Tominski, C.: Visualization of Time-Oriented Data. Human-Computer Interaction Series. Springer, Berlin (2011)CrossRefGoogle Scholar
  3. 3.
    Battle, L., Stonebraker, M., Chang, R.: Dynamic reduction of query result sets for interactive visualizaton. In: IEEE Big Data, pp. 1–8. IEEE (2013)Google Scholar
  4. 4.
    Bresenham, J.E.: Algorithm for computer control of a digital plotter. IBM Syst. J. 4(1), 25–30 (1965)CrossRefGoogle Scholar
  5. 5.
    Burtini, G., Fazackerley, S., Lawrence, R.: Time series compression for adaptive chart generation. In: CCECE, pp. 1–6. IEEE (2013)Google Scholar
  6. 6.
    Chen, J.X., Wang, X.: Approximate line scan-conversion and antialiasing. Comput. Graph. Forum 18(1), 69–78 (1999)CrossRefGoogle Scholar
  7. 7.
    Chi, E.H., Riedl, J.T.: An operator interaction framework for visualization systems. In: Symposium on Information Visualization, pp. 63–70. IEEE (1998)Google Scholar
  8. 8.
    Cudré-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Wang, D.L., Balazinska, M., Becla, J., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)Google Scholar
  9. 9.
    Salomon, David: Data Compression. Springer, Berlin (2007)Google Scholar
  10. 10.
    Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartogr. J. 10(2), 112–122 (1973)CrossRefGoogle Scholar
  11. 11.
    Duan, Q., Wang, P., Wu, M., Wang, W., Huang, S.: Approximate query on historical stream data. In: DEXA, pp. 128–135. Springer (2011)Google Scholar
  12. 12.
    Eick, S.G., Karr, A.F.: Visual scalability. J. Comput. Graph. Stat. 11(1), 22–43 (2002)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Elmqvist, N., Fekete, J.D.: Hierarchical aggregation for information visualization: overview, techniques and design guidelines. TVCG 16(3), 439–454 (2010)Google Scholar
  14. 14.
    Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12–34 (2012)CrossRefGoogle Scholar
  15. 15.
    Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA database-data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012)CrossRefGoogle Scholar
  16. 16.
    Fu, T., Chung, F., Luk, R., Ng, C.: Representing financial time series based on data point importance. EAAI J. 21(2), 277–300 (2008)Google Scholar
  17. 17.
    Fu, T.C.: A review on time series data mining. EAAI J. 24(1), 164–181 (2011)Google Scholar
  18. 18.
    Gandhi, S., Foschini, L., Suri, S.: Space-efficient online approximation of time series data: streams, amnesia, and out-of-order. In: ICDE, pp. 924–935. IEEE (2010)Google Scholar
  19. 19.
    Haber, R.B., McNabb, D.A.: Visualization idioms: a conceptual model for scientific visualization systems. Vis. Sci. Comput. 74, 93 (1990)Google Scholar
  20. 20.
    Hershberger, J., Snoeyink, J.: Speeding up the Douglas–Peucker line-simplification algorithm. University of British Columbia, Department of Computer Science (1992)Google Scholar
  21. 21.
    Jerzak, Z., Heinze, T., Fehr, M., Gröber, D., Hartung, R., Stojanovic, N.: The DEBS 2012 grand challenge. In: DEBS, pp. 393–398. ACM (2012)Google Scholar
  22. 22.
    Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: Faster visual analytics through pixel-perfect aggregation. PVLDB 7(13), 1705–1708 (2014)Google Scholar
  23. 23.
    Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: M4: a visualization-oriented time series data aggregation. PVLDB 7(10), 797–808 (2014)Google Scholar
  24. 24.
    Jugel, U., Markl, V.: Interactive visualization of high-velocity event streams. PVLDB (PhD Workshop) 5(13) (2012)Google Scholar
  25. 25.
    Keim, D.A., Panse, C., Schneidewind, J., Sips, M., Hao, M.C., Dayal, U.: Pushing the limit in visual data exploration: techniques and applications. LNCS 2821, 37–51 (2003)Google Scholar
  26. 26.
    Keogh, E.J., Pazzani: A simple dimensionality reduction technique for fast similarity search in large time series databases. In: PAKDD, pp. 122–133. Springer (2000)Google Scholar
  27. 27.
    Kolesnikov, A.: Efficient Algorithms for Vectorization and Polygonal Approximation. University of Joensuu, Joensuu (2003)Google Scholar
  28. 28.
    Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. TVCG 12(5), 1245–1250 (2006)Google Scholar
  29. 29.
    Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. Comput. Graph. Forum 32(3pt4), 421–430 (2013)CrossRefGoogle Scholar
  30. 30.
    Ma, W., Bedner, I., Chang, G., Kuchinsky, A., Zhang, H.: A framework for adaptive content delivery in heterogeneous network environments. In: Proceedings of SPIE, Multimedia Computing and Networking, vol. 3969, pp. 86–100. SPIE (2000)Google Scholar
  31. 31.
    Mackinlay, J., Hanrahan, P., Stolte, C.: Show me: automatic presentation for visual analysis. TVCG 13(6), 1137–1144 (2007)Google Scholar
  32. 32.
    Mutschler, C., Ziekow, H., Jerzak, Z.: The DEBS 2013 grand challenge. In: DEBS, pp. 289–294. ACM (2013)Google Scholar
  33. 33.
    Office of Electricity Delivery & Energy Reliability: Smart Grid (2014). http://energy.gov/oe/technology-development/smart-grid
  34. 34.
    Przymus, P., Boniewicz, A., Burzańska, M., Stencel, K.: Recursive query facilities in relational databases: a survey. In: DTA and BSBT, pp. 89–99. Springer (2010)Google Scholar
  35. 35.
    Reumann, K., Witkam, A.P.M.: Optimizing curve segmentation in computer graphics. In: Proceedings of the International Computing Symposium, pp. 467–472. North-Holland Publishing Company (1974)Google Scholar
  36. 36.
    Shi, W., Cheung, C.: Performance evaluation of line simplification algorithms for vector generalization. Cartogr. J. 43(1), 27–44 (2006)CrossRefGoogle Scholar
  37. 37.
    Upson, C., Faulhaber Jr, T.A., Kamins, D., Laidlaw, D., Schlegel, D., Vroom, J., Gurwitz, R., Van Dam, A.: The application visualization system: a computational environment for scientific visualization. IEEE Comput. Graph. Appl. 9(4), 30–42 (1989)CrossRefGoogle Scholar
  38. 38.
    Visvalingam, M., Whyatt, J.D.: Line generalisation by repeated elimination of points. Cartogr. J. 30(1), 46–51 (1993)CrossRefGoogle Scholar
  39. 39.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  40. 40.
    Wesley, R., Eldridge, M., Terlecki, P.T.: An analytic data engine for visualization in Tableau. In: SIGMOD, pp. 1185–1194. ACM (2011)Google Scholar
  41. 41.
    Wu, E., Battle, L., Madden, S.R.: The case for data visualization management systems. PVLDB 7(10), 903–906 (2014)Google Scholar
  42. 42.
    Wu, Y., Agrawal, D., El Abbadi, A.: A comparison of DFT and DWT based similarity search in timeseries databases. In: CIKM, pp. 488–495. ACM (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.SAP SEWalldorf/DresdenGermany
  2. 2.Technische Universität BerlinBerlinGermany

Personalised recommendations