Extracting abstract and keywords from context for academic articles

  • Ahmet Anıl Müngen
  • Mehmet KayaEmail author
Original Article


Every year thousands of academic studies are published all over the world. When researchers search for a topic, they quickly look at abstracts and keywords. In many academic disciplines, the authors write keywords and abstracts in their publications. On the other hand, there are publications of some disciplines, such as social sciences which do not contain keywords and abstracted information. In addition, there may be no abstract or keyword in some of old publications in all disciplines. Search engines for academic publications usually conduct this search by checking keywords, abstracts and titles. The lack of an abstract and a keyword in the publication makes this situation difficult to provide accurate search results and it prevents the researcher to review the publication quickly. This study proposes a method to generate keywords and an abstract from the text that can be used in academic studies. In the previous studies, k-NN and fuzzy CCG methods have been generally used to solve this problem. Nonetheless, the structures of words have not been examined and semantic analysis has not been used for solving this problem. In this study, the sections of the publication are also divided into parts such as the references, the introduction and the methodology. Each section is graded differently so that the word in each section has a different score. Furthermore, NLP methods were used to analyze texts and phrases, removing prepositions and conjunctions. After these processes, the data was used to generate the keyword using TF–IDF. Text generation for abstract is also performed using the TextRank method with this data. Thus, much more successful, truthful and contextually relevant keywords and abstracts are produced. The proposed method was tested on Sobiad Academic Database, which is employed by 72 universities in Turkey, covering more than 250,000 academic publications. Experimental results were measured with precision and F measure, and the results were found to be promising compared to the previous studies, which focused on keyword derivation and abstract generation.


Keyword extraction Summarization Abstract generation TextRank 



This study was supported by TUBITAK under Grant no: 116E889. We would like to thank Sobiad for sharing their data and services.


  1. Al-Saleh AB, Menai MEB (2016) Automatic Arabic text summarization: a survey. Artif Intell Rev 45(2):203–234CrossRefGoogle Scholar
  2. Dwihananto D, Moh T-S (2007) Effectively finding the right keywords for the target audience. In: 2007 IEEE international symposium on signal processing and information technology, Giza, pp 766–771Google Scholar
  3. EI-Ghannam F, EI-Shishtawy T (2013) Multi-topic multi-document summarizer. Int J Comput Sci Inf Technol 5(6):77–90Google Scholar
  4. El-Beltagy SR, Rafea A (2009) KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf Syst 34(1):132–144CrossRefGoogle Scholar
  5. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRefGoogle Scholar
  6. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval—SIGIR’01, New Orleans, LA, pp 19–25Google Scholar
  7. Hliaoutakis A, Zervanou K, Petrakis EGM (2007) Medical document indexing and retrieval: AMTEx vs. NLM MMTx. In: Proceedings of the 12th international symposium for health information management research ISHIMR, Sheffield, UKGoogle Scholar
  8. Hong B, Zhen D (2012) An extended keyword extraction method. Phys Procedia 24:1120–1127CrossRefGoogle Scholar
  9. Jo T (2016a) Using string vector based KNN for keyword extraction. In: International conference of information and knowledge engineering|IKE’16, Los VegasGoogle Scholar
  10. Jo T (2016b) Table based KNN for extracting keywords. In: 2016 18th international conference on advanced communication technology (ICACT)Google Scholar
  11. Kaikhah K (2004) Automatic text summarization with neural networks. In: Intelligent systems, 2004. Proceedings. 2004 2nd international IEEE conference, pp 40–44Google Scholar
  12. Karnalim O (2017) Software keyphrase extraction with domain-specific features. In: Proceedings—2016 international conference on advanced computing and applications, ACOMP 2016, Can Tho City, pp 43–50Google Scholar
  13. Kiyoumarsi F, Esfahani FR (2011) Optimizing Persian text summarization based on fuzzy logic approach. In: Proceedings of international conference, vol 5. IACSIT Press, Singapore, pp 264–269Google Scholar
  14. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632MathSciNetzbMATHCrossRefGoogle Scholar
  15. Li Q, Wu YFB (2006) Identifying important concepts from medical documents. J Biomed Inform 39(6):668–679CrossRefGoogle Scholar
  16. Liu W, Chung BC, Wang R, Ng J, Morlet N (2015) A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters. Health Inf Sci Syst 3(5):1–14Google Scholar
  17. Mashechkin IV, Petrovskiy MI, Popov DS, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305MathSciNetzbMATHCrossRefGoogle Scholar
  18. Mihalcea R (2005) Language independent extractive summarization. Evaluation pp 49–52Google Scholar
  19. Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Proceedings of EMNLP, vol 85, pp 404–411Google Scholar
  20. Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: IEEE international conference on computer, communication, and signal processingGoogle Scholar
  21. Niu J, Chen H, Zhao Q, Su L, Atiquzzaman M (2017) Multi-document abstractive summarization using chunk-graph and recurrent neural network. In: IEEE international conference on communicationsGoogle Scholar
  22. Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37:405–417MathSciNetCrossRefGoogle Scholar
  23. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. World Wide Web Internet Web Inf Syst 54(1999–1966):1–17Google Scholar
  24. Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—a platform for multi document multilingual text summarization. In: Conference on language resources and evaluation (LREC), pp 699–702Google Scholar
  25. Rahaman M, Amin R (2017) Language independent statistical approach for extracting keywords. In: 4th International conference on advances in electrical engineering (ICAEE), 2017Google Scholar
  26. Ribeiro-Neto B, Horizonte B, Cristo M, Golgher PB, Pampulha C, De Moura ES (2005) Impedance coupling in content-targeted advertising. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, Salvador, pp 496–503Google Scholar
  27. Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using Wikipedia. Inf Process Manag 50(3):443–461CrossRefGoogle Scholar
  28. Sarkar K (2009) Automatic keyphrase extraction from medical documents. Springer, Berlin, pp 273–278Google Scholar
  29. Shen D, Sun J, Li H, Yang Q, Chen Z (2004) Document summarization using conditional random fields. Science (80-) 7:2862–2867Google Scholar
  30. Song M, Tanapaisankit P (2013) BioKeySpotter: an unsupervised keyphrase extraction technique in the biomedical full-text collection. In: Holmes DE, Jain LC (eds) Data mining: foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 19–27Google Scholar
  31. Suanmali L, Binwahlan MS, Salim N (2009a) Sentence features fusion for text summarization using fuzzy logic. In: 2009 ninth international conference on hybrid intelligent systems, Washington, DC, pp 142–146Google Scholar
  32. Suanmali L, Salim N, Binwahlan MS (2009b) Fuzzy logic based method for improving text summarization. J Comput Sci 2(1):6Google Scholar
  33. Svore KM, Way M, Vanderwende L, Burges CJC (2007) Enhancing single-document summarization by combining Ranknet and third-party sources. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 448–457Google Scholar
  34. Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: Belgian/Netherlands artificial intelligence conference, pp 379–380Google Scholar
  35. Wong W, Thangarajah J, Padgham L (2012) Contextual question answering for the health domain. J Am Soc Inf Sci Technol 63(11):2313–2327CrossRefGoogle Scholar
  36. Wu YB, Li Q, Bot RS, Chen X (2005) Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM international conference on information and knowledge management—CIKM’05, Bremen, p 283Google Scholar
  37. Yakovlev M, Chernyak E (2016) Using annotated suffix tree similarity measure for text summarisation. In: Studies in classification, data analysis, and knowledge organization, pp 103–112Google Scholar
  38. Yih W, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. In: Proceedings of the 15th international conference on World Wide Web—WWW’06, Edinburgh, p 213Google Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer EngineeringFırat UniversityElazigTurkey

Personalised recommendations