A Supervised Learning to Rank Approach for Dependency Based Concept Extraction and Repository Based Boosting for Domain Text Indexing

  • U. K. NaadanEmail author
  • T. V. Geetha
  • U. Kanimozhi
  • D. Manjula
  • R. Viswapriya
  • C. Karthik
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10859)


In conventional information retrieval systems, keywords extracted from documents are indexed and used for retrieval. Since same information can be represented by different keywords, there is hindrance in extracting relevant documents. Concept based indexing and retrieval which semantically identifies similar documents overcomes this problem by mapping the document phrases to a domain repository. In this paper, the problem of extracting and ranking concepts i.e. key phrases, from domain oriented text is explored. This paper ranks concepts (key phrases) of a document based not only on statistical and cue phrases but also based on the dependency relations in which the candidate concept occurs. For each candidate a vector is formed with the phrase weight and the dependency relations. The features used to score the phrases in the vectors, for re-ranking and as features to weigh the vector corresponding to the candidate are the cue features (presence in title, abstract), C-value in case of multi-words, frequency of occurrence and the type of dependency relation. The ranking process utilizes RankingSVM to rank the candidate concepts based on the feature vectors. In addition, to make the ranking domain sensitive and to determine the domain relevance of the candidate concepts they are fully or partially matched with the domain repository. Based on the depth of the concept and the presence of parent and siblings, the domain relevant concepts are boosted up the order. The results indicate that the use of dependency based context vector and domain repository provides substantial enhancement in the key phrase extraction task compared with other methods.


Dependency-based key phrase extraction Repository based concept mapping Learning to Rank Domain text indexing 


  1. 1.
    Vechtomova, O.: A semi-supervised approach to extracting multiword entity names from user reviews. In: Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, 12 August 2012, p. 2. ACM (2012)Google Scholar
  2. 2.
    Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL, vol. 1, pp. 1262–1273, June 2014Google Scholar
  3. 3.
    Wang, L., Li, S.: PKU_ICL at SemEval-2017 Task 10: keyphrase extraction with model ensemble and external knowledge. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 934–937 (2017)Google Scholar
  4. 4.
    Chebil, W., Soualmia, L.F., Omri, M.N., Darmoni, S.J.: Biomedical concepts extraction based on possibilistic network and vector space model. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) AIME 2015. LNCS (LNAI), vol. 9105, pp. 227–231. Springer, Cham (2015). Scholar
  5. 5.
    Wan, X., Xiao, J.: Single document key phrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860, 13 July 2008Google Scholar
  6. 6.
    Torii, M., Wagholikar, K., Liu, H.: Using machine learning for concept extraction on clinical documents from multiple data sources. J. Am. Med. Inform. Assoc. 18(5), 580–587 (2011)CrossRefGoogle Scholar
  7. 7.
    Arnold, P., Rahm, E.: SemRep: a repository for semantic mapping. In: BTW 2015, pp. 177–194 (2015)Google Scholar
  8. 8.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical term extraction: overview and a new methodology. Inf. Retrieval J. 19, 59–99 (2016)CrossRefGoogle Scholar
  9. 9.
    Jonnagaddala, J., Chang, N.W., Jue, T.R., Dai, H.J.: Recognition and normalization of disease mentions in PubMed abstracts. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, Sevilla, Spain, 9 September 2015, pp. 9–11 (2015)Google Scholar
  10. 10.
    Chu, W.W., Liu, Z., Mao, W., Zou, Q.: KMeX: a knowledge-based digital library for retrieving scenario-specific medical text documents. In: Biomedical Information Technology, pp. 307–341 (2008)CrossRefGoogle Scholar
  11. 11.
    Hasan, K.S., Ng, V.: Conundrums in unsupervised key phrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 23 August 2010, pp. 365–373 (2010)Google Scholar
  12. 12.
    Eichler, K., Neumann, G.: DFKI KeyWE: ranking key phrases extracted from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, 15 July 2010, pp. 150–153 (2010)Google Scholar
  13. 13.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Combining c-value and keyword extraction methods for biomedical terms extraction. In: LBM: Languages in Biology and Medicine, 12 December 2013Google Scholar
  14. 14.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: BIOTEX: a system for biomedical terminology extraction, ranking, and validation. In: ISWC: International Semantic Web Conference, 19 October 2014Google Scholar
  15. 15.
    Popova, S., Khodyrev, I.: Ranking in key phrase extraction problem: is it suitable to use statistics of words occurrences. Proc. Inst. Syst. Program. 26(4), 123–136 (2014)CrossRefGoogle Scholar
  16. 16.
    Dinh, D., Tamine, L.: Biomedical concept extraction based on combining the content-based and word order similarities. In: Proceedings of the 2011 ACM Symposium on Applied Computing, 21 March 2011, pp. 1159–1163. ACM (2011) Google Scholar
  17. 17.
    Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27, 443–460 (2015)CrossRefGoogle Scholar
  18. 18.
    Jiang, X., Hu, Y., Li, H.: A ranking approach to key phrase extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 19 July 2009, pp. 756–757. ACM (2009)Google Scholar
  19. 19.
    Shi, W., Zheng, W., Yu, J.X., Cheng, H., Zou, L.: Keyphrase extraction using knowledge graphs. Data Sci. Eng. 2(4), 275–288 (2017)CrossRefGoogle Scholar
  20. 20.
    Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to key phrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1105–1115 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • U. K. Naadan
    • 1
    Email author
  • T. V. Geetha
    • 1
  • U. Kanimozhi
    • 1
  • D. Manjula
    • 1
  • R. Viswapriya
    • 2
  • C. Karthik
    • 2
  1. 1.Anna UniversityChennaiIndia
  2. 2.Scope e-Knowledge Center (P) Ltd.ChennaiIndia

Personalised recommendations