A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections

Tantanasiriwong, Supaporn; Haruechaiyasak, Choochart; Guha, Sumanta

doi:10.1007/978-3-319-12823-8_42

A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections

Supaporn Tantanasiriwong¹⁸,
Choochart Haruechaiyasak¹⁹ &
Sumanta Guha¹⁸

Conference paper

2022 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8839))

Abstract

An extraction tool, nowadays, has become useful for text mining researchers to find keywords and keyphrases from the documents. Performing keywords and keyphrases extraction for cross-domain information are more challenging since both domains of interest are different in word usage. In this paper, two popular keyphrases extraction tools, Maui and Carrot, are investigated, for extracting terms from cross-domain document databases. The characteristic of keywords or phrases matching among different domain collections is presented and used for determining the keyphrase extraction tool for patent documents and scientific publications. In our experiment, matching between a patent and its cited publication are the key point. For evaluation, the performance of cross-domain matching is measured by comparing the similarity measure among those extraction tool results. The experimental results show that Maui tool proves to be the appropriate keyphrases extraction tool with its best performance measured by Cosine similarity of 3.31% when compared with Carrot tool for cross-domain document collections matching.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nguyen, T.D., Kan, M.-Y.: Keyphrase Extraction in Scientific Publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)
Chapter Google Scholar
Kaur, B. and Sidhu, B.: Methods for key phrase extraction from documents. In: Technological Research in Engineering (IJTRE) (2014)
Google Scholar
Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: JCDL 2006 (2006)
Google Scholar
Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59, 1026–1040 (2008)
Article Google Scholar
Medelyan, O.: Human-competitive automatic topic indexing. In: PhD thesis, University of Waikato, New Zealand (2009)
Google Scholar
Medelyan, O., Frank, E., Witten, I.: Human-competitive tagging using automatic keyphrase extraction. In: Empirical Methods in Natural Language Processing, pp. 1318–1327 (2009)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Stefanowski, J., Weiss, D.: Carrot2 and language properties in Web search results clustering. In: 1st International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, pp. 240–249 (2003)
Google Scholar
Verma, M., Varma, V.: Applying key phrase extraction to aid invalidity search. In: 13th International Conference on Artificial Intelligence and Law, pp. 249–255 (2011)
Google Scholar
Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: Kea: Practical automatic keyphrase extraction. In: 4th ACM conference on Digital Libraries, pp. 254–255 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Information Management, School of Engineering and Technology, Asian Institute of Technology, Thailand
Supaporn Tantanasiriwong & Sumanta Guha
Speech and Audio Technology Laboratory,National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
Choochart Haruechaiyasak

Authors

Supaporn Tantanasiriwong
View author publications
You can also search for this author in PubMed Google Scholar
Choochart Haruechaiyasak
View author publications
You can also search for this author in PubMed Google Scholar
Sumanta Guha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Khon Kaen University, 40002, Khon Kaen, Thailand
Kulthida Tuamsuk
Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, 606-8501, Sakyo-ku, Kyoto, Japan
Adam Jatowt
University of British Columbia, Vancouver, B.C., Canada
Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tantanasiriwong, S., Haruechaiyasak, C., Guha, S. (2014). A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-12823-8_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12822-1
Online ISBN: 978-3-319-12823-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics