Toward Generic Title Generation for Clustered Documents

Tseng, Yuen-Hsien; Lin, Chi-Jen; Chen, Hsiu-Han; Lin, Yu-I

doi:10.1007/11880592_12

Yuen-Hsien Tseng²⁰,
Chi-Jen Lin²¹,
Hsiu-Han Chen²¹ &
…
Yu-I Lin²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Asia Information Retrieval Symposium

1009 Accesses
11 Citations

Abstract

A cluster labeling algorithm for creating generic titles based on external resources such as WordNet is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Noyons, E.C.M., Van Raan, A.F.J.: Advanced Mapping of Science and Technology. Scientometrics 41, 61–67 (1998)
Article Google Scholar
The 8th Science and Technology Foresight Survey - Study on Rapidly-developing Research Areas - Interim Report, Science and Technology Foresight Center, National Institute of Science & Technology Policy, Japan (2004)
Google Scholar
Uchida, H., Mano, A., Yukawa, T.: Patent Map Generation Using Concept-Based Vector Space Model. In: Proceedings of the Fourth NTCIR Workshop on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization, Tokyo, Japan, June 2-4 (2004)
Google Scholar
Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)
Article Google Scholar
Lai, K.-K., Wu, S.-J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)
Article Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the 15th ACM-SIGIR Conference, pp. 318–329 (1992)
Google Scholar
Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proceedings of the 19th ACM-SIGIR Conference, pp. 76–84 (1996)
Google Scholar
Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving Text Categorization Methods for Event Tracking. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 65–72 (2000)
Google Scholar
Sahami, M., Yusufali, S., Baldonaldo, M.Q.W.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 200–209 (1998)
Google Scholar
Lagus, K., Kaski, S., Kohonen, T.: Mining Massive Document Collections by the WEBSOM Method. Information Sciences 163(1-3), 135–156 (2004)
Article Google Scholar
Swan, R., Allan, J.: Automatic Generation of Overview Timelines. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 49–56 (2000)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st ACM-SIGIR Conference, pp. 46–54 (1998)
Google Scholar
Document Understanding Conferences, http://www-nlpir.nist.gov/projects/duc/
Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000 (2000)
Google Scholar
Kennedy, P.E., Hauptmann, A.G.: Automatic title generation for EM. In: Proceedings of the 5th ACM Conference on Digital Libraries (2000)
Google Scholar
Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)
Google Scholar
Ng, H.T., Goh, W.B., Low, K.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. In: Proceedings of the 20th ACM-SIGIR Conference, pp. 67–73 (1997)
Google Scholar
Feldman, R., Dagan, I., Hirsh, H.: Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)
Article Google Scholar
WordNet: a lexical database for the English language, Cognitive Science Laboratory Princeton University, http://wordnet.princeton.edu/
United States Patent and Trademark Office, http://www.uspto.gov/
Yang, Y., Liu, X.: A Re-Examination of Text Categorization Methods. In: Proceedings of the 22nd ACM-SIGIR Conference, pp. 42–49 (1999)
Google Scholar
Tseng, Y.-H., Juang, D.-W., Wang, Y.-M., Lin, C.-J.: Text Mining for Patent Map Analysis. In: Proceedings of IACIS Pacific 2005 Conference, Taipei, Taiwan, May 19-21, pp. 1109–1116 (2005)
Google Scholar
Tseng, Y.-H.: Automatic Thesaurus Generation for Chinese Documents. Journal of the American Society for Information Science and Technology 53(13), 1130–1138 (2002)
Article Google Scholar
Information Mapping Project, Computational Semantics Laboratory, Standford University, http://infomap.stanford.edu/
Bekkerman, R., El-Yaniv, R., Winter, Y., Tishby, N.: On Feature Distributional Clustering for Text Categorization. In: Proceedings of the 24th ACM-SIGIR Conference, pp. 146–153 (2001)
Google Scholar
Dagan, I., Feldman, R.: Keyword-based browsing and analysis of large document sets. In: Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR 1996), Las Vegas, Nevada (1996)
Google Scholar
Kruskal, J.B.: Multidimensional Scaling and Other Methods for Discovering Structure. In: Enslein, K., Ralston, A., Wilf, H.S. (eds.) Statistical Methods for Digital Computers, pp. 296–339. Wiley, New York (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

National Taiwan Normal University, No. 162, Sec, 1, Heping East Road, Taipei, 106, Taiwan, R.O.C.
Yuen-Hsien Tseng
WebGenie Information LTD., B2F., No.207-1, Sec. 3, Beisin Rd.Shindian, Taipei, 231, Taiwan, R.O.C.
Chi-Jen Lin & Hsiu-Han Chen
Taipei Municipal Univ. of Education, 1, Ai-Kuo West Road, Taipei, 100, Taiwan, R.O.C.
Yu-I Lin

Authors

Yuen-Hsien Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Jen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hsiu-Han Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu-I Lin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore
Hwee Tou Ng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Mun-Kew Leong
Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Min-Yen Kan
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, P.O. Box, 119613, Singapore
Donghong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tseng, YH., Lin, CJ., Chen, HH., Lin, YI. (2006). Toward Generic Title Generation for Clustered Documents. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_12

Download citation

DOI: https://doi.org/10.1007/11880592_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics