Silent Day Detection on Microblog Data

Lu, Kuang; Fang, Hui

doi:10.1007/978-3-319-91947-8_46

Kuang Lu¹⁸ &
Hui Fang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

2416 Accesses
1 Citations

Abstract

Microblog has become an increasingly popular information source for users to get updates about the world. Given the rapid growth of the microblog data, users are often interested in getting daily (or even hourly) updates about a certain topic. Existing studies on microblog retrieval mainly focused on how to rank results based on their relevance, but little attention has been paid to whether we should return any results to search users. This paper studies the problem of silent day detection. Specifically, given a query and a set of tweets collected over a certain time period (such as a day), we need to determine whether the set contains any relevant tweets of the query. If not, this day is referred to as a silent day. Silent day detection enables us to not overwhelm users with non-relevant tweets. We formulate the problem as a classification problem, and propose two types of new features based on using collective information from query terms. Experiment results over TREC collections show that these new features are more effective in detecting silent days than previously proposed ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.internetlivestats.com/twitter-statistics/.
2.
We did not find the TREC report of the best run of 2016.

References

Arampatzis, A., Kamps, J.: An empirical study of query specificity. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 594–597. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_55
Chapter Google Scholar
Ault, T., Yang, Y.: kNN, rocchio and metrics for information filtering at TREC-10. In: Proceedings of TREC-10 (2001)
Google Scholar
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR 2002 (2002)
Google Scholar
Cummins, R.: Document score distribution models for query performance inference and prediction. ACM Trans. Inf. Syst. 32(1), 2 (2014)
Article Google Scholar
Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of SIGIR 2011 (2011)
Google Scholar
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of SIGIR 2005 (2005)
Google Scholar
Hauff, C., Azzopardi, L., Hiemstra, D.: The combination and evaluation of query performance prediction methods. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 301–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_28
Chapter Google Scholar
Lau, C.H., Li, Y., Tjondronegoro, D.: Microblog retrieval using topical features and query expansion. In: Proceedings of TREC 2011 (2011)
Google Scholar
Lin, J., Mohammed, S., Sequiera, R., Tan, L., Ghelani, N., Abualsaud, M.: Overview of the TREC 2017 real-time summarization track. In: Proceedings of TREC 2017 (2017)
Google Scholar
Lin, J., Efron, M., Wang, Y., Sherman, G., Voorhees, E.: Overview of the TREC-2015 microblog track. In: Proceedings of TREC 2015 (2015)
Google Scholar
Lin, J., Roegiest, A., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: Overview of the TREC 2016 real-time summarization track. In: Proceedings of TREC 2016 (2016)
Google Scholar
Moulahi, B., Jabeur, L.B., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: IRIT at TREC real-time summarization 2016. In: Proceedings of TREC 2016 (2016)
Google Scholar
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: Proceedings of TREC 2011 (2011)
Google Scholar
Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21
Chapter Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: NIST Special Publication 500–225: Overview of the Third Text REtrieval Conference (TREC-3) (1994)
Google Scholar
Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of SIGIR 2014 (2014)
Google Scholar
Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 1–35 (2012)
Article Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996 (1996)
Google Scholar
Soboroff, I., Ounis, I., Macdonald, C., Lin, J.: Overview of the TREC 2012 microblog track. In: Proceedings of TREC 2012 (2012)
Google Scholar
Srivastava, A., Sahami, M.: Text Mining: Classification, Clustering, and Applications (2009)
Google Scholar
Tan, L., Roegiest, A., Lin, J., Clarke, C.L.: An exploration of evaluation metrics for mobile push notifications. In: Proceedings of SIGIR 2016 (2016)
Google Scholar
Tang, J., Lv, C., Yao, L., Zhao, D.: PKUICST at TREC 2017 real-time summarization track: push notifications and email digest. In: Proceedings of TREC 2017 (2017)
Google Scholar
Tomlinson, S.: Robust, web and terabyte retrieval with hummingbird searchservertm at TREC 2004. In: Proceedings of TREC-13 (2004)
Google Scholar
Yom-Tov, E., Fine, S., Carmel, D., Darlow, A.: Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of SIGIR 2005 (2005)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Article Google Scholar
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Chapter Google Scholar
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of SIGIR 2007 (2007)
Google Scholar
Zhu, X., Huang, J., Zhu, S., Chen, M., Zhang, C., Zhenzhen, L., Dongchuan, H., Chengliang, Z., Li, A., Jia, Y.: NUDTSNA at TREC 2015 microblog track: a live retrieval system framework for social network based on semantic expansion and quality model. In: Proceedings of TREC 2015 (2015)
Google Scholar

Download references

Acknowledgements

This research was supported by the U.S. National Science Foundation under IIS-1423002.

Author information

Authors and Affiliations

University of Delaware, Newark, DE, 19716, USA
Kuang Lu & Hui Fang

Authors

Kuang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuang Lu .

Editor information

Editors and Affiliations

Université de Franche-Comté, Besançon, France
Max Silberztein
Conservatoire National des Arts et Métiers, Paris, France
Faten Atigui
Conservatoire National des Arts et Métiers, Paris, France
Elena Kornyshova
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Manchester, United Kingdom
Farid Meziane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, K., Fang, H. (2018). Silent Day Detection on Microblog Data. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-91947-8_46
Published: 22 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics