Abstract
An extraction tool, nowadays, has become useful for text mining researchers to find keywords and keyphrases from the documents. Performing keywords and keyphrases extraction for cross-domain information are more challenging since both domains of interest are different in word usage. In this paper, two popular keyphrases extraction tools, Maui and Carrot, are investigated, for extracting terms from cross-domain document databases. The characteristic of keywords or phrases matching among different domain collections is presented and used for determining the keyphrase extraction tool for patent documents and scientific publications. In our experiment, matching between a patent and its cited publication are the key point. For evaluation, the performance of cross-domain matching is measured by comparing the similarity measure among those extraction tool results. The experimental results show that Maui tool proves to be the appropriate keyphrases extraction tool with its best performance measured by Cosine similarity of 3.31% when compared with Carrot tool for cross-domain document collections matching.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Nguyen, T.D., Kan, M.-Y.: Keyphrase Extraction in Scientific Publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)
Kaur, B. and Sidhu, B.: Methods for key phrase extraction from documents. In: Technological Research in Engineering (IJTRE) (2014)
Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: JCDL 2006 (2006)
Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59, 1026–1040 (2008)
Medelyan, O.: Human-competitive automatic topic indexing. In: PhD thesis, University of Waikato, New Zealand (2009)
Medelyan, O., Frank, E., Witten, I.: Human-competitive tagging using automatic keyphrase extraction. In: Empirical Methods in Natural Language Processing, pp. 1318–1327 (2009)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)
Stefanowski, J., Weiss, D.: Carrot2 and language properties in Web search results clustering. In: 1st International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, pp. 240–249 (2003)
Verma, M., Varma, V.: Applying key phrase extraction to aid invalidity search. In: 13th International Conference on Artificial Intelligence and Law, pp. 249–255 (2011)
Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: Kea: Practical automatic keyphrase extraction. In: 4th ACM conference on Digital Libraries, pp. 254–255 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tantanasiriwong, S., Haruechaiyasak, C., Guha, S. (2014). A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-12823-8_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12822-1
Online ISBN: 978-3-319-12823-8
eBook Packages: Computer ScienceComputer Science (R0)