Advertisement

A Concept Based Graph Model for Document Representation Using Coreference Resolution

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 384)

Abstract

Graph representation is an efficient way of representing text and it is used for document similarity analysis. A lot of research has been done in document similarity analysis but all of them are keyword based methods like Vector Space Model and Bag of Words. These methods do not preserve the semantics of the document. Our paper proposes a concept based graph model which follows a Triplet Representation with coreference resolution which extract the concepts in both sentence and document level. The extracted concepts are clustered using a modified DB Scan algorithm which then forms a belief network. In this paper we also propose a modified algorithm for Triplet Generation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, vol. 71 (1998)Google Scholar
  2. 2.
    Aone, C., Bennett, S.W.: Evaluating automated and manual acquisition of anaphora resolution strategies. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1995)Google Scholar
  3. 3.
    Fisher, D., et al.: Description of the UMass system as used for MUC-6. In: Proceedings of the 6th Conference on Message Understanding. Association for Computational Linguistics (1995)Google Scholar
  4. 4.
    Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)CrossRefGoogle Scholar
  5. 5.
    Veena, G., Lekha, N.K.: A concept based clustering model for document similarity. In: 2014 International Conference on Data Science & Engineering (ICDSE). IEEE (2014)Google Scholar
  6. 6.
    Veena, G., Lekha, N.K.: An extended chameleon algorithm for document clustering. In: El-Alfy, E.-S., Thampi, S.M., Takagi, H., Piramuthu, S., Hanne, T. (eds.) Advances in Intelligent Informatics. AISC, vol. 320, pp. 335–348. Springer, Heidelberg (2015)Google Scholar
  7. 7.
  8. 8.
    Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, May 2012Google Scholar
  9. 9.
    Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: five years of open-source language. In: Processing Tools Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), ELRA, La Valletta, May 2010Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer Science and ApplicationAmrita Vishwa VidyapeethamCoimbatoreIndia

Personalised recommendations