Skip to main content

Meta-data and Strategies of Textual Data Analysis: Problems and Instruments

  • Conference paper
Data Science, Classification, and Related Methods

Summary

In order to develop a proper multidimensional content analysis, we discuss some typical aspects of a pre-treatment of a textual data analysis. In particular: i) how to select the peculiar subset of the words in a text; ii) how to reduce the word ambiguity. Our proposal is to use both frequency dictionaries and reference lexicons as external lexical knowledge bases with respect to the corpus, by means of a comparison of ranking, inspired by Wegman’s parallel coordinate method. The conditions of iso-frequency of unlernmatized forms as an indication of the need for lemmatization is considered. Finally in order to evaluate the opportunities of the choices (both disambiguations and fusions), we propose the reconstruction, by means of bootstrapping strategy, of some convex hulls — as word confidence areas — in a factorial plane. Some examples from a large corpus of parliamentary discourses are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Balbi, S. (1995): Non symmetrical correspondence analysis of textual data and confidence regions for graphical forms. In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), II, 5–12, CISU, Roma

    Google Scholar 

  • Bécue, M. et Haeusler, L. (1995): Vers une post-codification automatique In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), I, 35–42, CISU, Roma

    Google Scholar 

  • Bolasco, S. (1993): Choix de lemmatisation en vue de reconstructions syntagmatiques du texte par l’analyse des correspondances. Proc. JADT 1993, 399–410, ENST-Telecom, Paris

    Google Scholar 

  • Bolasco, S. (1994): L’individuazione di forme testuali per lo studio statistico dei testi con tecniche di analisi multidimensionale. Atti della XXXVII Riunione Scientifica della S.I.S., II, 95–103, CISU, Roma

    Google Scholar 

  • Bortolini N., Tagliavini C., Zampolli A. (1971): Lessico di frequenza della lingua italiano contemporanea. Garzanti., Milano.

    Google Scholar 

  • Dubois, J. et al. (1979): Dizionario di Linguistica, Bologna: Zanichelli

    Google Scholar 

  • Elia, A. (1995): Per una disambiguazione semi-automatica di sintagmi composti: i dizionari elettronici lessico-grammaticali. In: Ricerca Qualitativa e Computer, Cipriani, R. e Bolasco, S. (eds.), 112–141, Franco Angeli, Milano

    Google Scholar 

  • Cipriani, R. e Bolasco, S., eds. (1995): Ricerca Qualitativa e Computer. Franco Angeli, Milano

    Google Scholar 

  • Lavit, Ch. (1988): Analyse conjointe de tableaux quantitatifs. Masson, Paris

    Google Scholar 

  • Lebart, L. et Salem, A. (1994): Statistique textuelle. Dunod, Paris

    Google Scholar 

  • Lyne A. A. (1985): The vocabulary of french business correspondence, Slatkine-Champion, Paris

    Google Scholar 

  • Salem, A. (1987): Pratique des segments répétés. Essai de statistique textuelle. Klincksieck, Paris

    Google Scholar 

  • Weguran, E. J. (1990): Hyperdimensional Data Analysis Using Parallel Coordinates JASA, 85, 411, 664–675

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Japan

About this paper

Cite this paper

Bolasco, S. (1998). Meta-data and Strategies of Textual Data Analysis: Problems and Instruments. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_52

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-65950-1_52

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-70208-5

  • Online ISBN: 978-4-431-65950-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics