Advertisement

Behavior Research Methods, Instruments, & Computers

, Volume 32, Issue 3, pp 468–481 | Cite as

Automatic disambiguation of morphosyntax in spoken language corpora

  • Christophe ParisseEmail author
  • Marie-thérèse Le Normand
Article

Abstract

The use of computer tools has led to major advances in the study of spoken language corpora. One area that has shown particular progress is the study of child language development. Although it is now easy to lexically tag every word in a spoken language corpus, one still has to choose between numerous ambiguous forms, especially with languages such as French or English, where more than 70% of words are ambiguous. Computational linguistics can now provide a fully automatic disambiguation of lexical tags. The tool presented here (POST) can tag and disambiguate a large text in a few seconds. This tool complements systems dealing with language transcription and suggests further theoretical developments in the assessment of the status of morphosyntax in spoken language corpora. The program currently works for French and English, but it can be easily adapted for use with other languages. The analysis and computation of a corpus produced by normal French children 2–4 years of age, as well as of a sample corpus produced by French SLI children, are given as examples.

Keywords

Specific Language Impairment Syntactic Category Training Corpus Unknown Word Child Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Adda, G. (1987).Reconnaissance de grands vocabulaires: Une étude syntaxique et lexicale [Recognition of large vocabularies: A syntactic and lexical study]. Unpublished doctoral dissertation, Université de Paris-Sud, Orsay.Google Scholar
  2. Andreewsky, A., Debili, F., &Fluhr, C. (1980). Apprentissage—Syntaxe, sémantique lexicale [Training—syntax, lexical semantics].Revue du palais de la découverte,9 (83), 17–40.Google Scholar
  3. Andreewsky, A., &Fluhr, C. (1973).Apprentissage-Analyse automatique du langage, application à la documentation [Training—automatic language analysis, application to data-retrieval] (Vol. 21). Paris: Dunod.Google Scholar
  4. Baker-Van den Goorbergh, L. (1994). Computers and language analysis: Theory and practice.Child Language Teaching & Therapy,10, 329–348.CrossRefGoogle Scholar
  5. Baker-Van den Goorbergh, L., &Baker, K. (1991).1991: Computerised language error analysis report (CLEAR). Kibworth, U.K.: FAR Communications.Google Scholar
  6. Bishop, D. V. M. (1984). Automated LARSP: Computer-assisted grammatical analysis.British Journal of Disorders of Communication,19, 78–87.CrossRefPubMedGoogle Scholar
  7. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging.Computational Linguistics,21, 543–565.Google Scholar
  8. Cappelli, G.,Maccari, A., &Pfanner, L. (1991, May).A system for semiautomatical treatment of child morphology. Paper presented at the 4th Annual Sentence Processing Conference (CUNY, Rochester).Google Scholar
  9. Chanod, J. P., &Tapanainen, P. (1995, March).Tagging French-Comparing a statistical and a constraint-based method. Paper presented at the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin.Google Scholar
  10. Charniak, E. (1997). Statistical techniques for natural language parsing.AI Magazine,18, 33–44.Google Scholar
  11. Charniak, E.,Hendrickson, C.,Jacobson, N., &Perkowitz, M. (1993).Equations for part-of-speech tagging. Paper presented at the Eleventh National Conference on Artificial Intelligence, Menlo Park.Google Scholar
  12. Chevrie-Muller, C. S. A. M., &Decante, P. (1981).Épreuves pour l’examen du langage [Tools for language assessment]. Paris: Editions du Centre de Psychologie Appliquée.Google Scholar
  13. Church, K. W. (1988, April).A stochastic parts program and noun phrase parser for unrestricted text. Paper presented at the Conference on Applied Natural Language Processing, Trento, Italy.Google Scholar
  14. Crystal, D., Fletcher, P., &Garman, M. (1976).The grammatical analysis of language disability. London: Edouard Arnold.Google Scholar
  15. Cutting, D.,Kupiec, J.,Pedersen, J., &Sibun, P. (1992, April).A practical part-of-speech tagger. Paper presented at the conference on Applied Natural Language Processing, Trento, Italy.Google Scholar
  16. Fluhr, C. (1977).Algorithmes à apprentissage et traitement automatique des langues [Learning algorithms and automatic language processing]. Unpublished thesis, Université de Paris-Sud Orsay, Orsay.Google Scholar
  17. Le Normand, M. T. (1986). A developmental exploration of language used to accompany symbolic play in young, normal children (2–4 years old).Child: Care, Health & Development,12, 121–134.CrossRefGoogle Scholar
  18. Long, S. H., &Fey, M. E. (1995a). Clearing the air: A reply to Baker-Van den Goorbergh (1994).Child Language Teaching & Therapy,11, 185–192.CrossRefGoogle Scholar
  19. Long, S. H., &Fey, M. E. (1995b). Computer applications: Computerized profiling (1993).Child Language Teaching & Therapy,11, 209–216.CrossRefGoogle Scholar
  20. MacWhinney, B. (1991).The CHILDES project-Computational tools for analyzing talk. Hillsdale, NJ: Erlbaum.Google Scholar
  21. MacWhinney, B. (1995).The CHILDES project: Tools for analyzing talk (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
  22. MacWhinney, B., &Snow, C. E. (1985). The Child Language Data Exchange System.Journal of Child Language,12, 271–296.CrossRefPubMedGoogle Scholar
  23. Merialdo, B. (1994). Tagging English text with a probabilistic model.Computational Linguistics,20, 155–172.Google Scholar
  24. Miller, J. F., &Chapman, R. S. (1982).SALT: Semantic Analysis of Language Transcripts. Language Analysis Laboratory, Waisman Center on Mental Retardation and Human Development, University of Madison, Wisconsin.Google Scholar
  25. Miller, J. F., &Chapman, R. S. (1983). Using microcomputers to advance research in language disorders.Theory Into Practice,XXII, 301–307.CrossRefGoogle Scholar
  26. Nakamura, M.,Maruyama, K.,Kawabata, T., &Shikano, K. (1990, August).Neural network approach to word category prediction for English texts. Paper presented at the COLING-90, Helsinki.Google Scholar
  27. Parisse, C. (1989).Reconnaissance de l’écriture manuscrite: Analyse de la forme globale des mots et utilisation de la morpho-syntaxe [Machine recognition of handwriting: Global analyses of word shapes and morpho-syntactic evaluation]. Unpublished doctoral dissertation, Université de Paris-Sud, Orsay.Google Scholar
  28. Perkins, M. (1994). Repetitiveness in language disorders: A new analytical procedure.Clinical Linguistics & Phonetics,8, 321–336.Google Scholar
  29. Perkins, M.,Catizone, R.,Peers, I., &Wilks, Y. (1997, June).Clinical computational corpus linguistics: A case study. Paper presented at the 6th Annual Conference of the ICPLA, Nijmegen.Google Scholar
  30. Rondal, J. A., Bachelet, J. F., &Pérée, F. (1985). Analyse du langage et des interactions verbales adulte-enfant [Analysis of adult-child language and verbal interactions].Bulletin d’Audiophonologie,5, 507–536.Google Scholar
  31. Schmid, H. (1994, August).Part-of-speech tagging with neural networks. Paper presented at the COLING-94, Kyoto.Google Scholar
  32. Schütze, H. (1995, March).Distributional part-of-speech tagging. Paper presented at the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin.Google Scholar
  33. Schütze, M. (1997).Ambiguity resolution in language learning. Stanford, CA.Google Scholar
  34. Theakston, A. L., Lieven, E. V. M., Pine, J. M., &Rowland, C. F. (1999). The role of performance limitations in the acquisition of “mixed” verb-argument structure at stage 1. In M. Perkins & S. Howard (Eds.),New directions in language development and disorders. New York: Plenum.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2000

Authors and Affiliations

  • Christophe Parisse
    • 1
    Email author
  • Marie-thérèse Le Normand
    • 1
  1. 1.Institut National de la Santé et de la Recherche MédicaleParisFrance

Personalised recommendations