iExplore: Accelerating Exploratory Data Analysis by Predicting User Intention

  • Zhihui Yang
  • Jiyang Gong
  • Chaoying Liu
  • Yinan JingEmail author
  • Zhenying HeEmail author
  • Kai Zhang
  • X. Sean WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)


Exploratory data analysis over large datasets has become an increasingly prevalent use case. However, users are easily overwhelmed by the data and might take a long time to find interesting facts. In this paper, we design a system called iExplore to assist users in doing this time-consuming data exploration task through predicting user intention. Moreover, we propose an intention model to help the iExplore system have a comprehensive understanding of user’s intention. Thus, the exploratory process can be accelerated by the intention-driven recommendation and prefetching mechanisms. Extensive experiments demonstrate that the intention-driven iExplore system can significantly lighten the burden of users and facilitate the exploratory process.


User intention Data exploration Query log 


  1. 1.
    Abazajian, K.N., Adelman-McCarthy, J.K., Agüeros, M.A., Allam, S.S., Prieto, C.A., An, D., Anderson, K.S., Anderson, S.F., Annis, J., Bahcall, N.A., et al.: The seventh data release of the sloan digital sky survey. Astrophys. J. Suppl. Ser. 182(2), 543 (2009)CrossRefGoogle Scholar
  2. 2.
    Aouiche, K., Darmont, J.: Data mining-based materialized view and index selection in data warehouses. J. Intell. Inf. Syst. 33(1), 65–93 (2009)CrossRefGoogle Scholar
  3. 3.
    Bowman, I.T., Salem, K.: Semantic prefetching of correlated query sequences. In: 2007 IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 1284–1288. IEEE (2007)Google Scholar
  4. 4.
    Brémaud, P.: Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, vol. 31. Springer Science & Business Media, Heidelberg (2013). Scholar
  5. 5.
    Crane, M.: Diversified relevance feedback. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 1142. ACM (2013)Google Scholar
  6. 6.
    Dimitriadou, K., Papaemmanouil, O., Diao, Y.: AIDE: an active learning-based approach for interactive data exploration. IEEE Trans. Knowl. Data Eng. 28(11), 2842–2856 (2016)CrossRefGoogle Scholar
  7. 7.
    Drosou, M., Pitoura, E.: Ymaldb: exploring relational databases via result-driven recommendations. VLDB J. 22(6), 849–874 (2013)CrossRefGoogle Scholar
  8. 8.
    Eirinaki, M., Abraham, S., Polyzotis, N., Shaikh, N.: Querie: collaborative database exploration. IEEE Trans. Knowl. Data Eng. 26(7), 1778–1790 (2014)CrossRefGoogle Scholar
  9. 9.
    Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley, Hoboken (2017)CrossRefGoogle Scholar
  10. 10.
    Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 277–281. ACM (2015)Google Scholar
  11. 11.
    Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 472–483. IEEE (2014)Google Scholar
  12. 12.
    Khoussainova, N., Kwon, Y., Balazinska, M., Suciu, D.: SnipSuggest: context-aware autocompletion for SQL. Proc. VLDB Endow. 4(1), 22–33 (2010)CrossRefGoogle Scholar
  13. 13.
    Kosub, S.: A note on the triangle inequality for the jaccard distance (2016). arXiv preprint arXiv:1612.02696
  14. 14.
    Ramachandran, K., Shah, B., Raghavan, V.V.: Dynamic pre-fetching of views based on user-access patterns in an OLAP system. In: ICEIS, vol. 1, pp. 60–67 (2005)Google Scholar
  15. 15.
    Sapia, C.: PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 224–233. Springer, Heidelberg (2000). Scholar
  16. 16.
    Sellam, T., Kersten, M.: Cluster-driven navigation of the query space. IEEE Trans. Knowl. Data Eng. 28(5), 1118–1131 (2016)CrossRefGoogle Scholar
  17. 17.
    Singh, V., Gray, J., Thakar, A., Szalay, A.S., Raddick, J., Boroski, B., Lebedeva, S., Yanny, B.: Skyserver traffic report-the first five years. arXiv preprint cs/0701173 (2007)Google Scholar
  18. 18.
    Tauheed, F., Heinis, T., Schürmann, F., Markram, H., Ailamaki, A.: SCOUT: prefetching for latent structure following queries. Proc. VLDB Endow. 5(11), 1531–1542 (2012)CrossRefGoogle Scholar
  19. 19.
    Zhang, J., Chen, C., Vogeley, M.S., Pan, D., Thakar, A., Raddick, J.: SDSS log viewer: visual exploratory analysis of large-volume SQL log data. In: Visualization and Data Analysis 2012, vol. 8294, pp. 82940D. International Society for Optics and Photonics (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer ScienceFudan UniversityShanghaiChina
  2. 2.Shanghai Key Laboratory of Data ScienceShanghaiChina
  3. 3.Shanghai Institute of Intelligent Electronics & SystemsShanghaiChina

Personalised recommendations