Advertisement

An Article Language Model for BBS Search

  • Jingfang Xu
  • Yangbo Zhu
  • Xing Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3579)

Abstract

Bulletin Board Systems (BBS), similar to blogs, newsgroups, online forums, etc., are online broadcasting spaces where people can exchange ideas and make announcements. As BBS are becoming valuable repositories of knowledge and information, effective BBS search engines are required to make the information universally accessible and useful. However, the techniques that have been proven successful for web search are not suitable for searching BBS articles due to the nature of BBS. In this paper, we propose a novel article language model (LM) to build an effective BBS search engine. We investigate the differences between BBS articles and web pages, then extend the traditional LM to author LM and category LM. The article LM is powerful in the sense that it can combine the three LMs into a single framework. Experimental results shows that our article LM substantially outperforms both INQUERY algorithm and the traditional LM.

References

  1. 1.
    Kou, Z., Zhang, C.: Reply networks on a bulletin board system. Physical Review E 67 (2003)Google Scholar
  2. 2.
    Novak, J., Raghavan, P., Tomkins, A.: Anti-aliasing on the web. In: Proceedings of the 13th international conference on World Wide Web, pp. 30–39 (2004)Google Scholar
  3. 3.
    Kou, Z., Bao, T., Zhang, C.: Discovery of relationships between intereswts from bulletin board system by dissimilarity reconstruction. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 328–335. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web, vol. 7, pp. 107–117 (1998)Google Scholar
  5. 5.
    Kraft, R., Zien, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 668–677 (1998)Google Scholar
  6. 6.
    Kraft, R., Zien, J.: Mining anchor text for query refinement. In: Proceedings of the 13th international conference on World Wide Web, pp. 666–674 (2004)Google Scholar
  7. 7.
    Xue, G., Zeng, H., Chen, Z., Ma, W., Lu, C.: Log mining to improve the performance of site search. In: Third International Conference on Web Information Systems Engineering (2002)Google Scholar
  8. 8.
    Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 394–401 (2004)Google Scholar
  9. 9.
    Fagin, R., Kumar, R., McCurley, K.: Searching the workplace web. In: Proceedings of the twelfth international conference on World Wide Web, pp. 366–375 (2003)Google Scholar
  10. 10.
    Ponte, J., Croft, W.: A language modeling approach to information retrieval. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)Google Scholar
  11. 11.
    Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 391–397 (2002)Google Scholar
  12. 12.
    Song, F., Croft, W.: A general language model for information retrieval. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)Google Scholar
  13. 13.
    Lavrenko, V., Croft, W.: Relevance-based language models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 120–127 (2001)Google Scholar
  14. 14.
    Li, X., Croft, W.: Time-based language models. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 469–475 (2003)Google Scholar
  15. 15.
    Jin, R., Hauptmann, A.G., Title, C.Z.: language model for information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42–48 (2002)Google Scholar
  16. 16.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 28–36 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jingfang Xu
    • 1
  • Yangbo Zhu
    • 1
  • Xing Li
    • 1
  1. 1.Department of Electronic EngineeringTsinghua UniversityBeijingP.R. China

Personalised recommendations