Summary
In order to develop a proper multidimensional content analysis, we discuss some typical aspects of a pre-treatment of a textual data analysis. In particular: i) how to select the peculiar subset of the words in a text; ii) how to reduce the word ambiguity. Our proposal is to use both frequency dictionaries and reference lexicons as external lexical knowledge bases with respect to the corpus, by means of a comparison of ranking, inspired by Wegman’s parallel coordinate method. The conditions of iso-frequency of unlernmatized forms as an indication of the need for lemmatization is considered. Finally in order to evaluate the opportunities of the choices (both disambiguations and fusions), we propose the reconstruction, by means of bootstrapping strategy, of some convex hulls — as word confidence areas — in a factorial plane. Some examples from a large corpus of parliamentary discourses are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balbi, S. (1995): Non symmetrical correspondence analysis of textual data and confidence regions for graphical forms. In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), II, 5–12, CISU, Roma
Bécue, M. et Haeusler, L. (1995): Vers une post-codification automatique In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), I, 35–42, CISU, Roma
Bolasco, S. (1993): Choix de lemmatisation en vue de reconstructions syntagmatiques du texte par l’analyse des correspondances. Proc. JADT 1993, 399–410, ENST-Telecom, Paris
Bolasco, S. (1994): L’individuazione di forme testuali per lo studio statistico dei testi con tecniche di analisi multidimensionale. Atti della XXXVII Riunione Scientifica della S.I.S., II, 95–103, CISU, Roma
Bortolini N., Tagliavini C., Zampolli A. (1971): Lessico di frequenza della lingua italiano contemporanea. Garzanti., Milano.
Dubois, J. et al. (1979): Dizionario di Linguistica, Bologna: Zanichelli
Elia, A. (1995): Per una disambiguazione semi-automatica di sintagmi composti: i dizionari elettronici lessico-grammaticali. In: Ricerca Qualitativa e Computer, Cipriani, R. e Bolasco, S. (eds.), 112–141, Franco Angeli, Milano
Cipriani, R. e Bolasco, S., eds. (1995): Ricerca Qualitativa e Computer. Franco Angeli, Milano
Lavit, Ch. (1988): Analyse conjointe de tableaux quantitatifs. Masson, Paris
Lebart, L. et Salem, A. (1994): Statistique textuelle. Dunod, Paris
Lyne A. A. (1985): The vocabulary of french business correspondence, Slatkine-Champion, Paris
Salem, A. (1987): Pratique des segments répétés. Essai de statistique textuelle. Klincksieck, Paris
Weguran, E. J. (1990): Hyperdimensional Data Analysis Using Parallel Coordinates JASA, 85, 411, 664–675
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Japan
About this paper
Cite this paper
Bolasco, S. (1998). Meta-data and Strategies of Textual Data Analysis: Problems and Instruments. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_52
Download citation
DOI: https://doi.org/10.1007/978-4-431-65950-1_52
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-70208-5
Online ISBN: 978-4-431-65950-1
eBook Packages: Springer Book Archive