Advertisement

Towards Interactive Data Exploration

  • Carsten BinnigEmail author
  • Fuat Basık
  • Benedetto Buratti
  • Ugur Cetintemel
  • Yeounoh Chung
  • Andrew Crotty
  • Cyrus Cousins
  • Dylan Ebert
  • Philipp Eichmann
  • Alex Galakatos
  • Benjamin Hättasch
  • Amir Ilkhechi
  • Tim Kraska
  • Zeyuan Shang
  • Isabella Tromba
  • Arif Usta
  • Prasetya Utama
  • Eli Upfal
  • Linnan Wang
  • Nathaniel Weir
  • Robert Zeleznik
  • Emanuel Zgraggen
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

Enabling interactive visualization over new datasets at “human speed” is key to democratizing data science and maximizing human productivity. In this work, we first argue why existing analytics infrastructures do not support interactive data exploration and outline the challenges and opportunities of building a system specifically designed for interactive data exploration. Furthermore, we present the results of building IDEA, a new type of system for interactive data exploration that is specifically designed to integrate seamlessly with existing data management landscapes and allow users to explore their data instantly without expensive data preparation costs. Finally, we discuss other important considerations for interactive data exploration systems including benchmarking, natural language interfaces, as well as interactive machine learning.

References

  1. 1.
    Agarwal, S., et al.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)Google Scholar
  2. 2.
  3. 3.
    Binnig, C., et al.: Towards interactive curation & automatic tuning of ML pipelines. In: 1st Inaugural Conference on Systems ML (SysML) (2018)Google Scholar
  4. 4.
    Binnig, C., et al.: The end of slow networks: it’s time for a redesign. In: VLDB, pp. 528–539 (2016)CrossRefGoogle Scholar
  5. 5.
    Böhm, C., Berchtold, S., Kriegel, H., Michel, U.: Multidimensional index structures in relational databases. J. Intell. Inf. Syst. 15, 51–70 (2000)CrossRefGoogle Scholar
  6. 6.
    Chaudhuri, S., Das, G., Narasayya, V.R.: Optimized stratified sampling for approximate query processing. TODS 32, 9 (2007)CrossRefGoogle Scholar
  7. 7.
    Crotty, A., et al.: Vizdom Demo Video. https://vimeo.com/139165014
  8. 8.
    Crotty, A., et al.: Vizdom: interactive analytics through pen and touch. In: VLDB, pp. 2024–2035 (2015)Google Scholar
  9. 9.
    Crotty, A., Galakatos, A., Zgraggen, E., Binnig, C., Kraska, T.: Vizdom: interactive analytics through pen and touch. Proc. VLDB Endow. 8(12), 2024–2027 (2015)CrossRefGoogle Scholar
  10. 10.
    Crotty, A., Galakatos, A., Zgraggen, E., Binnig, C., Kraska, T.: The case for interactive data exploration accelerators (IDEAs). In: HILDA@SIGMOD, p. 11. ACM (2016)Google Scholar
  11. 11.
    Cumming, G., Finch, S.: Inference by eye: confidence intervals and how to read pictures of data. Am. Psychol. 60, 170–180 (2005)CrossRefGoogle Scholar
  12. 12.
    Eichmann, P., Zgraggen, E., Zhao, Z., Binnig, C., Kraska, T.: Towards a benchmark for interactive data exploration. IEEE Data Eng. Bull. 39(4), 50–61 (2016)Google Scholar
  13. 13.
    El-Hindi, M., Zhao, Z., Binnig, C., Kraska, T.: VisTrees: fast indexes for interactive data exploration. In: HILDA (2016)Google Scholar
  14. 14.
    Fisher, D., DeLine, R., Czerwinski, M., Drucker, S.: Interactions with big data analytics. Interactions 19(3), 50–59 (2012)CrossRefGoogle Scholar
  15. 15.
    Galakatos, A., Crotty, A., Zgraggen, E., Binnig, C., Kraska, T.: Revisiting reuse for approximate query processing. PVLDB 10(10), 1142–1153 (2017)Google Scholar
  16. 16.
    Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD, pp. 171–182 (1997)CrossRefGoogle Scholar
  17. 17.
    Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR, pp. 68–78 (2007)Google Scholar
  18. 18.
    Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: ACM SIGMOD, pp. 615–629. ACM (2016)Google Scholar
  19. 19.
    Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar
  20. 20.
    Liu, Z., Heer, J.: The effects of interactive latency on exploratory visual analysis. TVCG 20, 2122–2131 (2014)Google Scholar
  21. 21.
    Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. In: EuroVis, pp. 421–430 (2013)Google Scholar
  22. 22.
    Olken, F., Rotem, D.: Random sampling from relational databases. In: VLDB, pp. 160–169 (1986)Google Scholar
  23. 23.
    Pansare, N., Borkar, V.R., Jermaine, C., Condie, T.: Online aggregation for large MapReduce jobs. In: VLDB, pp. 1135–1145 (2011)Google Scholar
  24. 24.
    Snappy data. https://www.snappydata.io/. Accessed 02 Nov 2017
  25. 25.
    Tableau. http://www.tableau.com. Accessed 02 Nov 2017
  26. 26.
    The Apache Software Foundation. Hadoop. http://hadoop.apache.org
  27. 27.
    TPC-DS (2016). http://www.tpc.org/tpcds/. Accessed 02 Nov 2017
  28. 28.
    TPC-H (2016). http://www.tpc.org/tpch/. Accessed 02 Nov 2017
  29. 29.
    Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP, pp. 423–438 (2013)Google Scholar
  30. 30.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)Google Scholar
  31. 31.
    Zgraggen, E., Galakatos, A., Crotty, A., Fekete, J., Kraska, T.: How progressive visualizations affect exploratory analysis. IEEE Trans. Vis. Comput. Graph. 23(8), 1977–1987 (2017)CrossRefGoogle Scholar
  32. 32.
    Zhao, Z., De Stefani, L., Zgraggen, E., Binnig, C., Upfal, E., Kraska, T.: Controlling false discoveries during interactive data exploration. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 527–540. ACM (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Carsten Binnig
    • 1
    • 2
    Email author
  • Fuat Basık
    • 4
  • Benedetto Buratti
    • 1
  • Ugur Cetintemel
    • 2
  • Yeounoh Chung
    • 2
  • Andrew Crotty
    • 2
  • Cyrus Cousins
    • 2
  • Dylan Ebert
    • 2
  • Philipp Eichmann
    • 2
  • Alex Galakatos
    • 2
  • Benjamin Hättasch
    • 1
  • Amir Ilkhechi
    • 2
  • Tim Kraska
    • 2
    • 3
  • Zeyuan Shang
    • 2
  • Isabella Tromba
    • 3
  • Arif Usta
    • 4
  • Prasetya Utama
    • 2
  • Eli Upfal
    • 2
  • Linnan Wang
    • 2
  • Nathaniel Weir
    • 2
  • Robert Zeleznik
    • 2
  • Emanuel Zgraggen
    • 2
  1. 1.TU DarmstadtDarmstadtGermany
  2. 2.Brown UniversityProvidenceUSA
  3. 3.Massachusetts Institute of TechnologyCambridgeUSA
  4. 4.Bilkent UniversityAnkaraTurkey

Personalised recommendations