Abstract
Queries with threshold are common when dealing with unstructured data such as text corpus. It often requires several exploring attempts for users to achieve final results. In this work, we propose an automatic sampling method for threshold determination without any interaction with users, in which two optimizing algorithms are introduced to reach the lower-bound time complexity in each sampling trial. We evaluate our methods using several experiments and demonstrate the effectiveness of it, which can be an enormously powerful tool for ordinary users.
Similar content being viewed by others
References
Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)
Bentley, J.: Programming pearls: perspective on performance. Commun. ACM 27(9), 1087–1092 (1984)
Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., Zdonik, S. B.: Query steering for interactive data exploration. In: CIDR (2013)
Cheng, R., Kalashnikov, D. V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9), 1112–1127 (2004)
Diao, Y., Dimitriadou, K., Li, Z., Liu, W., Papaemmanouil, O., Peng, K., Peng, L.: Aide: an automatic user navigation system for interactive data exploration. PVLDB 8(12), 1964–1967 (2015)
Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: SIGMOD (2014)
Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Aide: an active learning-based approach for interactive data exploration. IEEE Trans. Knowl. Data Eng. 28(11), 2842–2856 (2016)
Drosou, M., Pitoura, E.: Ymaldb: exploring relational databases via result-driven recommendations. VLDB J. 22(6), 849–874 (2013)
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T.: Mining optimized association rules for numeric attributes. In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp 182–191. ACM (1996)
Fung, G. P. C., Yu, J. X., Yu, P. S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB (2005)
Griffiths, T. L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proceedings of the 24th Annual Conference of the Cognitive Science Society, pp 381–386. Citeseer (2002)
Griffiths, T. L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 50–57. ACM (1999)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)
Jiang, L., Nandi, A.: Snaptoquery: providing interactive feedback during exploratory query specification. PVLDB 8(11), 1250–1261 (2015)
Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Interactive data exploration with smart drill-down. arXiv:1412.0364 (2014)
Kahng, M., Navathe, S. B., Stasko, J. T., Chau, D. H.: Interactive browsing and navigation in relational databases. arXiv:1603.02371 (2016)
Kamat, N., Jayachandran, P., Tunga, K., Nandi, A.: Distributed and interactive cube exploration. In: ICDE (2014)
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Disc. 7(4), 373–397 (2003)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: SIGKDD (2009)
Qarabaqi, B., Riedewald, M.: User-driven refinement of imprecise queries. In: ICDE (2014)
Sellam, T., Kersten, M.: Fast, explainable view detection to characterize exploration queries. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, p 20. ACM (2016)
Sellam, T., Kersten, M., et al.: Meet charles, big data query advisor. In: CIDR (2013)
Sellam, T., Müller, E., Kersten, M.: Semi-automated exploration of data warehouses. In: CIKM (2015)
Smith, D. R.: Applications of a strategy for designing divide-and-conquer algorithms. Sci. Comput. Program. 8(3), 213–229 (1987)
Soliman, M. A., Ilyas, I. F., Chang, K. C. -C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Tukey, J.: Exploratory data analysis. Addison-Wesley, Reading, Mass., (1977)
Vartak, M., Rahman, S., Madden, S., Parameswaran, A., Polyzotis, N.: SeeDB: efficient data-driven visualization recommendations to support visual analytics. Proceedings of the VLDB Endowment 8(13), 2182–2193 (2015)
Yang, Z., Ma, H., He, Z., Wang, X. S.: Finding maximal ranges with unique topics in a text database. World Wide Web 1–22 (2017). https://doi.org/10.1007/s11280-017-0448-y
Acknowledgment
This work is supported by NSFC(No. 61732004, 61370080) and the Shanghai Innovation Action Project (No. 16DZ1100200).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Ma, H., Yang, Z., Jing, Y. et al. Answering unique topic queries with dynamic threshold. World Wide Web 22, 39–58 (2019). https://doi.org/10.1007/s11280-018-0528-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-018-0528-7