Skip to main content

Silent Day Detection on Microblog Data

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Microblog has become an increasingly popular information source for users to get updates about the world. Given the rapid growth of the microblog data, users are often interested in getting daily (or even hourly) updates about a certain topic. Existing studies on microblog retrieval mainly focused on how to rank results based on their relevance, but little attention has been paid to whether we should return any results to search users. This paper studies the problem of silent day detection. Specifically, given a query and a set of tweets collected over a certain time period (such as a day), we need to determine whether the set contains any relevant tweets of the query. If not, this day is referred to as a silent day. Silent day detection enables us to not overwhelm users with non-relevant tweets. We formulate the problem as a classification problem, and propose two types of new features based on using collective information from query terms. Experiment results over TREC collections show that these new features are more effective in detecting silent days than previously proposed ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.internetlivestats.com/twitter-statistics/.

  2. 2.

    We did not find the TREC report of the best run of 2016.

References

  1. Arampatzis, A., Kamps, J.: An empirical study of query specificity. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 594–597. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_55

    Chapter  Google Scholar 

  2. Ault, T., Yang, Y.: kNN, rocchio and metrics for information filtering at TREC-10. In: Proceedings of TREC-10 (2001)

    Google Scholar 

  3. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR 2002 (2002)

    Google Scholar 

  4. Cummins, R.: Document score distribution models for query performance inference and prediction. ACM Trans. Inf. Syst. 32(1), 2 (2014)

    Article  Google Scholar 

  5. Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of SIGIR 2011 (2011)

    Google Scholar 

  6. Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of SIGIR 2005 (2005)

    Google Scholar 

  7. Hauff, C., Azzopardi, L., Hiemstra, D.: The combination and evaluation of query performance prediction methods. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 301–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_28

    Chapter  Google Scholar 

  8. Lau, C.H., Li, Y., Tjondronegoro, D.: Microblog retrieval using topical features and query expansion. In: Proceedings of TREC 2011 (2011)

    Google Scholar 

  9. Lin, J., Mohammed, S., Sequiera, R., Tan, L., Ghelani, N., Abualsaud, M.: Overview of the TREC 2017 real-time summarization track. In: Proceedings of TREC 2017 (2017)

    Google Scholar 

  10. Lin, J., Efron, M., Wang, Y., Sherman, G., Voorhees, E.: Overview of the TREC-2015 microblog track. In: Proceedings of TREC 2015 (2015)

    Google Scholar 

  11. Lin, J., Roegiest, A., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: Overview of the TREC 2016 real-time summarization track. In: Proceedings of TREC 2016 (2016)

    Google Scholar 

  12. Moulahi, B., Jabeur, L.B., Tan, L., McCreadie, R., Voorhees, E., Diaz, F.: IRIT at TREC real-time summarization 2016. In: Proceedings of TREC 2016 (2016)

    Google Scholar 

  13. Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 microblog track. In: Proceedings of TREC 2011 (2011)

    Google Scholar 

  14. Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21

    Chapter  Google Scholar 

  15. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: NIST Special Publication 500–225: Overview of the Third Text REtrieval Conference (TREC-3) (1994)

    Google Scholar 

  16. Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of SIGIR 2014 (2014)

    Google Scholar 

  17. Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 1–35 (2012)

    Article  Google Scholar 

  18. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996 (1996)

    Google Scholar 

  19. Soboroff, I., Ounis, I., Macdonald, C., Lin, J.: Overview of the TREC 2012 microblog track. In: Proceedings of TREC 2012 (2012)

    Google Scholar 

  20. Srivastava, A., Sahami, M.: Text Mining: Classification, Clustering, and Applications (2009)

    Google Scholar 

  21. Tan, L., Roegiest, A., Lin, J., Clarke, C.L.: An exploration of evaluation metrics for mobile push notifications. In: Proceedings of SIGIR 2016 (2016)

    Google Scholar 

  22. Tang, J., Lv, C., Yao, L., Zhao, D.: PKUICST at TREC 2017 real-time summarization track: push notifications and email digest. In: Proceedings of TREC 2017 (2017)

    Google Scholar 

  23. Tomlinson, S.: Robust, web and terabyte retrieval with hummingbird searchservertm at TREC 2004. In: Proceedings of TREC-13 (2004)

    Google Scholar 

  24. Yom-Tov, E., Fine, S., Carmel, D., Darlow, A.: Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of SIGIR 2005 (2005)

    Google Scholar 

  25. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)

    Article  Google Scholar 

  26. Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8

    Chapter  Google Scholar 

  27. Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of SIGIR 2007 (2007)

    Google Scholar 

  28. Zhu, X., Huang, J., Zhu, S., Chen, M., Zhang, C., Zhenzhen, L., Dongchuan, H., Chengliang, Z., Li, A., Jia, Y.: NUDTSNA at TREC 2015 microblog track: a live retrieval system framework for social network based on semantic expansion and quality model. In: Proceedings of TREC 2015 (2015)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the U.S. National Science Foundation under IIS-1423002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuang Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, K., Fang, H. (2018). Silent Day Detection on Microblog Data. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics