Abstract
Microblog has become an increasingly popular information source for users to get updates about the world. Given the rapid growth of the microblog data, users are often interested in getting daily (or even hourly) updates about a certain topic. Existing studies on microblog retrieval mainly focused on how to rank results based on their relevance, but little attention has been paid to whether we should return any results to search users. This paper studies the problem of silent day detection. Specifically, given a query and a set of tweets collected over a certain time period (such as a day), we need to determine whether the set contains any relevant tweets of the query. If not, this day is referred to as a silent day. Silent day detection enables us to not overwhelm users with non-relevant tweets. We formulate the problem as a classification problem, and propose two types of new features based on using collective information from query terms. Experiment results over TREC collections show that these new features are more effective in detecting silent days than previously proposed ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We did not find the TREC report of the best run of 2016.
References
Arampatzis, A., Kamps, J.: An empirical study of query specificity. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 594–597. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_55
Ault, T., Yang, Y.: kNN, rocchio and metrics for information filtering at TREC-10. In: Proceedings of TREC-10 (2001)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR 2002 (2002)
Cummins, R.: Document score distribution models for query performance inference and prediction. ACM Trans. Inf. Syst. 32(1), 2 (2014)
Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of SIGIR 2011 (2011)
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of SIGIR 2005 (2005)
Hauff, C., Azzopardi, L., Hiemstra, D.: The combination and evaluation of query performance prediction methods. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 301–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_28
Lau, C.H., Li, Y., Tjondronegoro, D.: Microblog retrieval using topical features and query expansion. In: Proceedings of TREC 2011 (2011)
Lin, J., Mohammed, S., Sequiera, R., Tan, L., Ghelani, N., Abualsaud, M.: Overview of the TREC 2017 real-time summarization track. In: Proceedings of TREC 2017 (2017)
Lin, J., Efron, M., Wang, Y., Sherman, G., Voorhees, E.: Overview of the TREC-2015 microblog track. In: Proceedings of TREC 2015 (2015)
Lin, J., Roegiest, A., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: Overview of the TREC 2016 real-time summarization track. In: Proceedings of TREC 2016 (2016)
Moulahi, B., Jabeur, L.B., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: IRIT at TREC real-time summarization 2016. In: Proceedings of TREC 2016 (2016)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: Proceedings of TREC 2011 (2011)
Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: NIST Special Publication 500–225: Overview of the Third Text REtrieval Conference (TREC-3) (1994)
Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of SIGIR 2014 (2014)
Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 1–35 (2012)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996 (1996)
Soboroff, I., Ounis, I., Macdonald, C., Lin, J.: Overview of the TREC 2012 microblog track. In: Proceedings of TREC 2012 (2012)
Srivastava, A., Sahami, M.: Text Mining: Classification, Clustering, and Applications (2009)
Tan, L., Roegiest, A., Lin, J., Clarke, C.L.: An exploration of evaluation metrics for mobile push notifications. In: Proceedings of SIGIR 2016 (2016)
Tang, J., Lv, C., Yao, L., Zhao, D.: PKUICST at TREC 2017 real-time summarization track: push notifications and email digest. In: Proceedings of TREC 2017 (2017)
Tomlinson, S.: Robust, web and terabyte retrieval with hummingbird searchservertm at TREC 2004. In: Proceedings of TREC-13 (2004)
Yom-Tov, E., Fine, S., Carmel, D., Darlow, A.: Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of SIGIR 2005 (2005)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of SIGIR 2007 (2007)
Zhu, X., Huang, J., Zhu, S., Chen, M., Zhang, C., Zhenzhen, L., Dongchuan, H., Chengliang, Z., Li, A., Jia, Y.: NUDTSNA at TREC 2015 microblog track: a live retrieval system framework for social network based on semantic expansion and quality model. In: Proceedings of TREC 2015 (2015)
Acknowledgements
This research was supported by the U.S. National Science Foundation under IIS-1423002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Lu, K., Fang, H. (2018). Silent Day Detection on Microblog Data. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)