Advertisement

Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification

  • Grigori Sidorov
  • Francisco Velasquez
  • Efstathios Stamatatos
  • Alexander Gelbukh
  • Liliana Chanona-Hernández
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7816)

Abstract

The paper introduces and discusses a concept of syntactic n-grams (sn-grams) that can be applied instead of traditional n-grams in many NLP tasks. Sn-grams are constructed by following paths in syntactic trees, so sn-grams allow bringing syntactic knowledge into machine learning methods. Still, previous parsing is necessary for their construction. We applied sn-grams in the task of authorship attribution for corpora of three and seven authors with very promising results.

Keywords

Syntactic n-grams sn-grams syntactic paths authorship attribution task SVM classifier 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic Dependency-Based N-grams as Classification Features. In: Mendoza, M.G. (ed.) MICAI 2012, Part II. LNCS (LNAI), vol. 7630, pp. 1–11. Springer, Heidelberg (2013)Google Scholar
  2. 2.
    Khalilov, M., Fonollosa, J.A.R.: N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 424–432 (2009)Google Scholar
  3. 3.
    Habash, N.: The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds.) INLG 2004. LNCS (LNAI), vol. 3123, pp. 61–69. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Agarwal, A., Biads, F., Mckeown, K.R.: Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 24–32 (2009)Google Scholar
  5. 5.
    Cheng, W., Greaves, C., Warren, M.: From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11(4), 411–433 (2006)CrossRefGoogle Scholar
  6. 6.
    Baayen, H., Tweedie, F., Halteren, H.: Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, pp. 121–131 (1996)Google Scholar
  7. 7.
    Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)CrossRefGoogle Scholar
  8. 8.
    Juola, P.: Authorship Attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)CrossRefGoogle Scholar
  9. 9.
    Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)Google Scholar
  10. 10.
    Koppel, M., Schler, J., et al.: Authorship attribution in the wild. Language Resources and Evaluation 45(1), 83–94 (2011)CrossRefGoogle Scholar
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  12. 12.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proc. of LREC (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Grigori Sidorov
    • 1
  • Francisco Velasquez
    • 1
  • Efstathios Stamatatos
    • 2
  • Alexander Gelbukh
    • 1
  • Liliana Chanona-Hernández
    • 3
  1. 1.Center for Computing Research (CIC)Instituto Politécnico Nacional (IPN)Mexico CityMexico
  2. 2.University of the AegeanGreece
  3. 3.ESIMEInstituto Politécnico Nacional (IPN)Mexico CityMexico

Personalised recommendations