Advertisement

Design of Experiments in Computational Linguistics

  • Grigori Sidorov
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

As we mentioned earlier in the book, in the automatic analysis of natural language (natural language processing, NLP) and in computational linguistics, machine learning methods are becoming more and more popular. Applying these methods increasingly gives better results. In this chapter, we describe the design of experiments in computational lingusitics: problem – corpus – gold standard – feature selection – dimensionality reduction – classification – evaluation (k-fold cross validation).

Bibliography

  1. 30.
    Gelbukh, A., Alexandrov, M., Han, SangYong: Detecting Inflection Patterns in Natural Language by Minimization of Morphological Model. In: A. Sanfeliu, J.F. Martínez, Trinidad, J.A. Carrasco Ochoa (Eds.) Lecture Notes in Computer Science N 3287, Springer-Verlag, pp. 432–438 (2004)Google Scholar
  2. 35.
    Gelbukh A., Sidorov, G.: Alignment of Paragraphs in Bilingual Texts using Bilingual Dictionaries and Dynamic Programming. Lecture Notes in Computer Science, N 4225, Springer-Verlag, pp 824-833 (2006)Google Scholar
  3. 37.
    Gelbukh, A., Sidorov, G., Han, SangYong: On Some Optimization Heuristics for Lesk-Like WSD Algorithms. Lecture Notes in Computer Science, N 3513, Springer-Verlag, pp. 402–405 (2005)Google Scholar
  4. 41.
    Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27:2, 153–198 (2001)MathSciNetCrossRefGoogle Scholar
  5. 51.
    Jiménez-Salazar, H., Pinto, D., Rosso, P.: Uso del punto de transición en la selección de términos índice para agrupamiento de textos cortos. Procesamiento del Lenguaje Natural, 35, pp. 383–390 (2005)Google Scholar
  6. 72.
    Medina Urrea, A.: Automatic Discovery of Affixes by means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7(2), pp. 97–114 (2000)CrossRefGoogle Scholar
  7. 76.
    Miranda-Jimenez, S., Gelbukh, A., Sidorov, G: Generación de resúmenes por medio de síntesis de grafos conceptuales. Revista “SIGNOS. Estudios de Lingüística”, 47(86) (2014)Google Scholar
  8. 84.
    Pichardo-Lagunas, O., Sidorov, G., Cruz-Cortés, N., Gelbukh, A.: Detección automática de primitivas semánticas en diccionarios explicativos con algoritmos bioinspirados. Onomazein, 28 (2013)Google Scholar
  9. 87.
    Reyes, J.A., Montes, A., González, J.G., Pinto, D.E.: Clasificación de roles semánticos usando características sintácticas, semánticas y contextuales. Computación y sistemas, 17(2): 263–272 (2013)Google Scholar
  10. 100.
    Sierra, G., Alarcón, R.: Recurrent patterns in definitory context. In: Proc. CICLing-2002, Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science N 2276, Springer-Verlag, pp. 438–440 (2002)Google Scholar
  11. 101.
    Sierra, G., McNaught, J.: Natural Language System for Terminological Information Retrieval. Lecture Notes in Computer Science, N 2588, Springer, pp. 543–554 (2003)Google Scholar
  12. 7.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)Google Scholar
  13. 16.
    Díaz Rangel, I., Sidorov, G., Suárez-Guerra, S.: Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomazein, 29 (2014)Google Scholar
  14. 93.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic Dependency-based N-grams as Classification Features. LNAI, 7630, pp. 1–11 (2012)Google Scholar
  15. 94.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification. LNCS, 7816 (Proc. of CICLing), pp. 13–24 (2013)Google Scholar
  16. 95.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic N-grams as Machine Learning Features for Natural Language Processing. Expert Systems with Applications, 41(3): 853–860 (2014)CrossRefGoogle Scholar
  17. 96.
    Sidorov, G.: N-gramas sintácticos y su uso en la lingüística computacional. Vectores de investigación, 6(6): 1–15 (2013)Google Scholar
  18. 97.
    Sidorov, G.: Non-continuous syntactic n-grams. Polibits, 48: 67–75 (2013)CrossRefGoogle Scholar
  19. 98.
    Sidorov, G.: Syntactic Dependency Based N-grams in Rule Based Automatic English as Second Language Grammar Correction. International Journal of Computational Linguistics and Applications, 4(2): 169–188 (2013)Google Scholar
  20. 45.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, 11(1), pp. 10–18 (2009)CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Grigori Sidorov
    • 1
  1. 1.Instituto Politécnico NacionalCentro de Investigación en ComputaciónMexico CityMexico

Personalised recommendations