Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval

  • Huizhong Duan
  • Chengxiang Zhai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6611)


Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we study how to exploit the naturally available raw thread structures of forums to improve retrieval accuracy in the language modeling framework. Specifically, we propose and study two different schemes for smoothing the language model of a forum post based on the thread containing the post. We explore several different variants of the two schemes to exploit thread structures in different ways. We also create a human annotated test data set for forum post retrieval and evaluate the proposed smoothing methods using this data set. The experiment results show that the proposed methods for leveraging forum threads to improve estimation of document language models are effective, and they outperform the existing smoothing methods for the forum post retrieval task.


Forum post retrieval language modeling smoothing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Cong, G., Wang, L., Lin, C.-Y., Song, Y.-I., Sun, Y.: Finding question-answer pairs from online forums. In: SIGIR 2008, pp. 467–474. ACM, New York (2008)Google Scholar
  3. 3.
    Hiemstra, D.: Statistical language models for intelligent XML retrieval. In: Intelligent Search on XML Data, pp. 107–118 (2003)Google Scholar
  4. 4.
    Hiemstra, D., Kraaij, W.: Twenty-one at trec-7: Ad-hoc and cross-language track. In: TREC 1999, pp. 227–238 (1999)Google Scholar
  5. 5.
    Hong, L., Davison, B.D.: A classification-based approach to question answering in discussion boards. In: SIGIR 2009 (2009)Google Scholar
  6. 6.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR 2001, September 2001, pp. 111–119 (2001)Google Scholar
  7. 7.
    Lin, C., Yang, J.-M., Cai, R., Wang, X.-J., Wang, W.: Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: SIGIR 2009, pp. 131–138. ACM, New York (2009)Google Scholar
  8. 8.
    Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR 2004, pp. 186–193. ACM Press, New York (2004)Google Scholar
  9. 9.
    Miller, D.R.H., Leek, T., Schwartz, R.M.: BBN at trec7: Using hidden markov models for information retrieval. In: Proceedings of the Seventh Text REtrieval Conference (TREC-7), pp. 80–89 (1998)Google Scholar
  10. 10.
    Ogilvie, P., Callan, J.: Hierarchical language models for xml component retrieval. In: Proceedings of INEX WorkshopGoogle Scholar
  11. 11.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM Press, New York (1998)Google Scholar
  12. 12.
    Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241 (1994)Google Scholar
  13. 13.
    Seo, J., Croft, W.B., Smith, D.A.: Online community search using thread structure. In: CIKM 2009, pp. 1907–1910. ACM, New York (2009)Google Scholar
  14. 14.
    Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407–414. Association for Computational Linguistics, Morristown (2006)Google Scholar
  15. 15.
    Weerkamp, W., Balog, K., de Rijke, M.: Using contextual information to improve search in email archives. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 400–411. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Xu, G., Ma, W.-Y.: Building implicit links from content for forum search. In: SIGIR 2006, pp. 300–307. ACM, New York (2006)Google Scholar
  17. 17.
    Zhai, C.: Statistical Language Models for Information Retrieval. In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2008)Google Scholar
  18. 18.
    Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: CIKM 2001, pp. 403–410. ACM Press, New York (2001)Google Scholar
  19. 19.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342. ACM Press, New York (2001)Google Scholar
  20. 20.
    Zhai, C., Lafferty, J.: Two-stage language models for information retrieval. In: SIGIR 2002 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Huizhong Duan
    • 1
  • Chengxiang Zhai
    • 1
  1. 1.University of IllinoisUrbanaUSA

Personalised recommendations