Semantic Representation of Malayalam Text Documents in Cricket Domain Using WordNet

  • Sreedhi Deleep KumarEmail author
  • E. U. Reshma
  • C. Sunitha
  • Amal Ganesh
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 26)


Semantic representation is an abstract language for representing the meaning of text. It is used for representing the sentences semantically which can be employed in various applications such as Question Answering System, Information Extraction, Summarization, Machine translation etc. Various methods are employed to represent text document. But only limited works are done in Malayalam language. A specific domain is chosen (Cricket Domain) so as to obtain better results in semantic representation. A lexical database in Malayalam (WordNet), will be used as a resource for obtaining the required information. WordNet is a hierarchical information base in any language. In this project, semantic representation is extracted from a single Malayalam text document. It generates an abstractive representation of the given input. Semantic representation can be effectively extracted after going through different stages. Tokenization involves separation of words from sentences as tokens whereas POS Tagging deals with tagging of these tokens as corresponding Nouns, Verbs, Adjectives etc. The so got tagged tokens will undergo Morphological analysis. Morphological analysis is the process of finding the stem word for each of the generated tokens. After the analysis, the details regarding the stem words are obtained by searching in the WordNet. Next, the Semantic triplets (Subject, Object, Predicate) are extracted from the sentence. These triplets will be helpful for obtaining the semantic representation. For representation, the verb is taken as the root element. The aim of this project is semantic representation of Malayalam text documents pertaining to cricket domain using the database WordNet.


Semantics Malayalam Cricket WordNet Semantic triplets 


  1. 1.
    Banu, M., Karthika, C., Sudarmani, P., Geethu, T.V.: Tamil document summarization using semantic graph method. In: International Conference on Computational Intelligence and Multimedia applications. IEEE (2007)Google Scholar
  2. 2.
    Subramaniam, M., Dalal, V.: Test model for rich semantic graph representation for Hindi text using abstractive method. IRJET 02(02) (2015)Google Scholar
  3. 3.
    Jayashree, R., Murthy, K.S., Sunny, K.: Keyword extraction based summarization of categorized Kannaad text documents. Int. J. Soft Comput. (IJSC) 2(4), 81 (2011)CrossRefGoogle Scholar
  4. 4.
    Khanam, M.H., Sravani, S.: Text summarization for Telugu document. IOSR J. Comput. Eng. (IOSR-JCE) 8(6), 25–28 (2016)Google Scholar
  5. 5.
    Gupta, V., Lehal, G.S.: Preprocessing phase of Punjabi language text summarization. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Kabeer, R., Sumam, M.I.: Text summarization of Malayalam documents-an experience. In: International Conference on Data Science and Engineering (ICDSE) (2014)Google Scholar
  7. 7.
    Jaya, A., Sunitha, C., Ganesh, A.: Abstractive summarization techniques in Indian languages. In: International Conference of Recent Trends in Computer Science, Peer Review under Responsibility of the Organizing Committee of ICRTCSE 2016 (2016). Scholar
  8. 8.
    Aref, M., Moawad, I., Ibrahim, S.: Rich semantic graph generation system prototype. In: The Tenth Conference on Language Engineering, Egypt (2010)Google Scholar
  9. 9.
    Thaokar, C., Malik, L.: Test model for summarize hindi text by extraction method. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 1138–1143. IEEE (2013)Google Scholar
  10. 10.
    Aref, M., Moawad, I.F.: Semantic graph reduction approach for abstractive text summarization. In: Computer Engineering and Systems (ICCES) (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sreedhi Deleep Kumar
    • 1
    Email author
  • E. U. Reshma
    • 1
  • C. Sunitha
    • 1
  • Amal Ganesh
    • 1
  1. 1.Computer Science DepartmentVidya Academy of Science and TechnologyThrissurIndia

Personalised recommendations