Abstract
This paper aims at exploring the capability of the so called Latent Semantic Analysis applied to a multilingual context. In particular we are interested in weighing how it could be useful in solving linguistic problems, moving from a statistical point of view. Here we focus on the possibility of evaluating the goodness of a translation by comparing the latent structures of the original text and its version in another natural language. Procrustes rotations are introduced in a statistical framework as a tool for reaching this goal. An application on one year of Le Monde Diplomatique and the corresponding Italian edition will show the effectiveness of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AKUTO, H. and LEBART, L. (1992): Le repas idéal. Analyse de réponses libres en anglais, fran ccais, japonais. Les Cahiers de l’Analyse des Données, 17, 327–352.
BALBI, S. (1995): Non symmetrical correspondence analysis of textual data and confidence regions for graphical forms. In: S. Bolasco et al. (Eds.), Actes des 3 es Journées internationales d’Analyse statistique des Données Textuelles. CISU, Roma, 2, 5–12.
BALBI. S. and DI MEGLIO, E. (2004): Contributions of Textual Data Analysis to Text Retrieval. In: D. Banks et al. (Eds.), Classification, Clustering and Data Mining Applications. Springer-Verlag, Berlin, 511–520.
BALBI, S. and ESPOSITO, V. (1998): Comparing advertising campaigns by means of textual data analysis with external information. In: S. Mellet (Ed.), Actes des 4 es Journées internationales d’Analyse statistique des Données Textuelles. UPRESA, Nice, 39–47.
BALBI, S. and MISURACA, M. (2005): Visualization Techniques in Non Symmetrical Relationships. In: S. Sirmakessis (Ed.), Knowledge Mining (Studies in Fuzziness and Soft Computing). Springer-Verlag, Berlin. 23–29.
DEERWESTER, S., DUMAIS, S.T., FURNAS, G.W., LANDAUER, T.K., HARSHMAN. R. (1990): Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.
GREFENSTETTE, G. (Ed.) (1998): Cross Language Information Retrieval Kluwer Academic Publishers, London.
GREENACRE, M. (1984): Theory and Application of Correspondence Analysis. Academic Press, London.
GOWER, J.C. (1975): Generalised Procrustes analysis. Psychometrica, 40, 33–51.
LANDAUER, T.K., FOLTZ, P.W., LAHAM, D. (1998): Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284.
LEBART, L. (1998): Text mining in different languages. Applied Stochastic Models and Data Analysis, 14, 323–334.
LITTMAN, M.L., DUMAIS, S.T., LANDAUER, T.K., (1998): Automatic cross-language information retrieval using latent semantic indexing. In: G. Grefenstette (Ed.): Cross Language Information Retrieval. Kluwer Academic Publishers. London. 51–62.
MARDIA, K.W., KENT, J.T., BIBBY, J.M. (1995): Multwariate Analysis. Academic Press, London.
SALTON, G. and BUCKLEY, C. (1988): Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24, 513–523.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Heidelberg
About this paper
Cite this paper
Balbi, S., Misuraca, M. (2006). Procrustes Techniques for Text Mining. In: Zani, S., Cerioli, A., Riani, M., Vichi, M. (eds) Data Analysis, Classification and the Forward Search. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35978-8_26
Download citation
DOI: https://doi.org/10.1007/3-540-35978-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35977-7
Online ISBN: 978-3-540-35978-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)