Advertisement

Bilingual Text Classification

  • Jorge Civera
  • Elsa Cubel
  • Enrique Vidal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4477)

Abstract

Bilingual documentation has become a common phenomenon in official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool. In this paper, different approaches will be proposed to tackle this bilingual classification task. On the one hand, three finite-state transducer algorithms from the grammatical inference framework will be presented. On the other hand, a naive combination of smoothed n-gram models will be introduced. To evaluate the performance of bilingual classifiers, two categorized bilingual corpora of different complexity were considered. Experiments in a limited-domain task show that all the models obtain similar results. However, results on a more open-domain task denote the supremacy of the naive approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Civera, J., Juan, A.: Mixtures of IBM Model 2. In: Proc. of EAMT, pp. 159–167 (2006)Google Scholar
  2. 2.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 on Learning for Text Categorization, pp. 41–48 (1998)Google Scholar
  3. 3.
    Juan, A., Vidal, E.: On the use of Bernoulli mixture models for text classification. Pattern Recognition 35(12), 2705–2710 (2002)zbMATHCrossRefGoogle Scholar
  4. 4.
    Civera, J., Cubel, E., Juan, A., Vidal, E.: Different approaches to bilingual text classification based on grammatical inference techniques. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 630–637. Springer, Heidelberg (2005)Google Scholar
  5. 5.
    Picó, D., Casacuberta, F.: Some statistical-estimation methods for stochastic finite-state transducers. Machine Learning 44, 121–142 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 421–437. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  7. 7.
    Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on PAMI 15, 448–458 (1993)Google Scholar
  8. 8.
    Oncina, J., Varó, M.: Using domain information during the learning of a subsequential transducer. In: Miclet, L., de la Higuera, C. (eds.) ICGI 1996. LNCS, vol. 1147, pp. 301–312. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  9. 9.
    Cubel, E.: Aprendizaje de transductores subsecuenciales estocásticos. Technical Report II-DSIC-B-23/01, Universidad Politécnica de Valencia, Spain (2002)Google Scholar
  10. 10.
    Brown, P., et al.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)Google Scholar
  11. 11.
    Vilar, J.M.: Improve the learning of subsequential transducers by using alignments and dictionaries. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 298–311. Springer, Heidelberg (2000)Google Scholar
  12. 12.
    Och, F., Ney, H.: Improved statistical alignment models. In: ACL, pp. 440–447 (2000)Google Scholar
  13. 13.
    Casacuberta, F., et al.: Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language 18, 25–47 (2004)CrossRefGoogle Scholar
  14. 14.
    Civera, J., Vilar, J.M., Cubel, E., Lagarda, A.L., Barrachina, S., Casacuberta, F., Vidal, E., Picó, D., González, J.: A syntactic pattern recognition approach to computer assisted translation. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 207–215. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modelling. In: Proc. of ACL’96, San Francisco, USA, pp. 310–318 (1996)Google Scholar
  17. 17.
    Amengual, J., et al.: The EuTrans-I speech translation system. Machine Translation 15, 75–103 (2000)zbMATHCrossRefGoogle Scholar
  18. 18.
    Simard, M.: The BAF: A Corpus of English-French Bitext. In: Proc. of LREC’98, Granada, Spain, vol. 1, pp. 489–494 (1998)Google Scholar
  19. 19.
    Llorens, D., Vilar, J.M., Casacuberta, F.: Finite state language models smoothed using n-grams. IJPRAI 16(3), 275–289 (2002)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jorge Civera
    • 1
  • Elsa Cubel
    • 1
  • Enrique Vidal
    • 1
  1. 1.Instituto Tecnológico de Informática, Universidad Politécnica de Valencia 

Personalised recommendations