Skip to main content

A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8839))

Abstract

An extraction tool, nowadays, has become useful for text mining researchers to find keywords and keyphrases from the documents. Performing keywords and keyphrases extraction for cross-domain information are more challenging since both domains of interest are different in word usage. In this paper, two popular keyphrases extraction tools, Maui and Carrot, are investigated, for extracting terms from cross-domain document databases. The characteristic of keywords or phrases matching among different domain collections is presented and used for determining the keyphrase extraction tool for patent documents and scientific publications. In our experiment, matching between a patent and its cited publication are the key point. For evaluation, the performance of cross-domain matching is measured by comparing the similarity measure among those extraction tool results. The experimental results show that Maui tool proves to be the appropriate keyphrases extraction tool with its best performance measured by Cosine similarity of 3.31% when compared with Carrot tool for cross-domain document collections matching.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nguyen, T.D., Kan, M.-Y.: Keyphrase Extraction in Scientific Publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Kaur, B. and Sidhu, B.: Methods for key phrase extraction from documents. In: Technological Research in Engineering (IJTRE) (2014)

    Google Scholar 

  3. Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: JCDL 2006 (2006)

    Google Scholar 

  4. Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59, 1026–1040 (2008)

    Article  Google Scholar 

  5. Medelyan, O.: Human-competitive automatic topic indexing. In: PhD thesis, University of Waikato, New Zealand (2009)

    Google Scholar 

  6. Medelyan, O., Frank, E., Witten, I.: Human-competitive tagging using automatic keyphrase extraction. In: Empirical Methods in Natural Language Processing, pp. 1318–1327 (2009)

    Google Scholar 

  7. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  8. Stefanowski, J., Weiss, D.: Carrot2 and language properties in Web search results clustering. In: 1st International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, pp. 240–249 (2003)

    Google Scholar 

  9. Verma, M., Varma, V.: Applying key phrase extraction to aid invalidity search. In: 13th International Conference on Artificial Intelligence and Law, pp. 249–255 (2011)

    Google Scholar 

  10. Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: Kea: Practical automatic keyphrase extraction. In: 4th ACM conference on Digital Libraries, pp. 254–255 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tantanasiriwong, S., Haruechaiyasak, C., Guha, S. (2014). A Comparative Study of Key Phrase Extraction for Cross-Domain Document Collections. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12823-8_42

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12822-1

  • Online ISBN: 978-3-319-12823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics