Advertisement

Applying Wikipedia-Based Explicit Semantic Analysis for Query-Biased Document Summarization

  • Yunqing Zhou
  • Zhongqi Guo
  • Peng Ren
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6215)

Abstract

Query-biased summary is a query-centered document brief representation. In many scenarios, query-biased summarization can be accomplished by implementing query-customized ranking of sentences within the web page. However, it is a tough work to generate this summary since it is hard to consider the similarity between the query and the sentences of a particular document for lacking of information and background knowledge behind these short texts. We focused on this problem and improved the summary generation effectiveness by involving semantic information in the machine learning process. And we found these improvements are more significant when query term occurrences are relatively low in the document.

Keywords

query-biased summary explicit semantic analysis Wikipedia machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Amini, M.-R., Gallinari, P.: The use of unlabeled data to improve supervised learning for text summarization. In: SIGIR, pp. 105–112 (2002)Google Scholar
  3. 3.
    Chuang, W.T., Yang, J.: Extracting sentence segments for text summarization: a machine learning approach. In: SIGIR, pp. 152–159 (2000)Google Scholar
  4. 4.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Jerome, H.: Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (2000)zbMATHGoogle Scholar
  6. 6.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI 2007: Proceedings of the 20th international joint inproceedings on Artifical intelligence, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007)Google Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  8. 8.
    Metzler, D., Kanungo, T.: Machine Learned Sentence Selection Strategies for Query-Biased Summarization. Learning to Rank for Information Retrieval, 40Google Scholar
  9. 9.
    Song, F., Bruce Croft, W.: A general language model for information retrieval. In: Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)Google Scholar
  10. 10.
    Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR inproceedings on Research and development in information retrieval, pp. 2–10. ACM, New York (1998)Google Scholar
  11. 11.
    Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)Google Scholar
  12. 12.
    Wang, C., Jing, F., Zhang, L., Zhang, H.-J.: Learning query-biased web page summarization. In: CIKM 2007: Proceedings of the sixteenth ACM inproceedings on Conference on information and knowledge management, pp. 555–562. ACM, New York (2007)Google Scholar
  13. 13.
    Zhai, C.X., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342 (2001)Google Scholar
  14. 14.
    Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., Sun, G.: A general boosting method and its application to learning ranking functions for web search. In: NIPS (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yunqing Zhou
    • 1
  • Zhongqi Guo
    • 1
  • Peng Ren
    • 1
  • Yong Yu
    • 1
  1. 1.Dept. of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations