Skip to main content

Textual Discriminant Analysis

  • Chapter
Exploring Textual Data

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 4))

  • 876 Accesses

Abstract

The statistical methods discussed in the previous chapters are applicable mainly in the exploratory phase (also known as the descriptive phase) of an analysis. However, data exploration is more dynamic and interactive than simple data description. It uses multivariate statistics to obtain visualizations or groupings of elements that can be either whole texts or items within texts. It looks for associations and structures as well as interesting summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Early discriminant analyses were carried out on biometric and anthropometric measures by statisticians Fisher (1936), and Mahalanobis (1936), who were attempting to predict belonging to ethnic groups on the basis of measurements of the skeleton. They were the first to use the technique that is sometimes known as linear discriminant analysis: it is one of the oldest methods, and it is also one of the methods that is most commonly used today.

    Google Scholar 

  2. The second version of the work of Mosteller and Wallace (1984) also contains a general panorama of attempts at authorship attribution.

    Google Scholar 

  3. Cf. the pioneering work of Palermo and Jenkins (1964). Cf. also Bouroche and Curvalle (1974).

    Google Scholar 

  4. This trait does not exclude the possibility of concentrating on such units in some areas. Think, for example, of the important role played by the tool words for and against in the analysis of political texts.

    Google Scholar 

  5. Cf. for example the work of Radday (1974) and Morton(1963) concerning the homogeneity of the book of Isaiah.

    Google Scholar 

  6. Today such an operation cannot be totally computerized. Important progress has been made in the realm of automatic syntactic analysis of texts, as shown, for example, by the ongoing improvements in spelling correction found in most text processors.

    Google Scholar 

  7. Note that although isolating tool words requires categorizing and removing ambiguities in a text (as in, for example, the word even),some expressions contain full words that are substitutes for function words (e.g., in fact),that a preliminary lemmatization might obscure.

    Google Scholar 

  8. Of course, this level of precision is misleading, because for some plays, there are entire paragraphs missing in certain editions, whereas disagreements as to the identity of words still exist for the text parts that are common to all sources.

    Google Scholar 

  9. The local Mahalanobis distance of point X to group k, which is used in quadratic discriminant analysis,is written dk(X) = (X-mk)’ Sk’’(X-mk) where Sk is the internal covariance matrix of group k with mean point (center of gravity) mk (see: Anderson, 1984; MacLachlan, 1992).

    Google Scholar 

  10. Cf. Lachenbruch and Mickey, 1968; Stone, 1974; Geisser, 1975.

    Google Scholar 

  11. This survey was instigated by the Institute of Research on Urban Life (a Japanese research institute sponsored by Tokyo Gas Company Ltd) under the direction of H. Akuto (cf. Akuto, 1992 ).

    Google Scholar 

  12. Details of the analyses in the three countries are given in Akuto (1992) and Akuto and Lebart (1992).

    Google Scholar 

  13. The latinization of Japanese writing introduced a confusion that would not have occurred with Chinese characters (Kanjis). The latinized graphical form SAKE, for example, designates rice wine as well as salmon in our coding scheme. This type of distortion of basic information only takes away a little from the richness of the aggregated lexical profiles, as the reader will be able to judge.

    Google Scholar 

  14. Recall that the test-value converts a critical probability into a standardized normal variable, for easier readability: the value 1.96 corresponds to the two-tailed threshold 0.05,whereas the value 5.51 corresponds to a probability on the order of 10–6.

    Google Scholar 

  15. Another calculation mode consists in characterizing a response by the mean test-value of the forms which it contains. This criterion, which favors concise responses, was not used here (cf. chapter 6, section 6.2).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Lebart, L., Salem, A., Berry, L. (1998). Textual Discriminant Analysis. In: Exploring Textual Data. Text, Speech and Language Technology, vol 4. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-1525-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-1525-6_9

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4942-1

  • Online ISBN: 978-94-017-1525-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics