Parallel Text Alignment

  • Charles B. Owen
  • James Ford
  • Fillia Makedon
  • Tilmann Steinberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1513)

Abstract

Parallel Text Alignment (PTA) is the problem of automatically aligning content in multiple text documents originating or derived from the same source. The implications of this result in improving multimedia data access in digital library applications range from facilitating the analysis of multiple English language translations of classical texts to enabling the ondemand and random comparison of multiple transcriptions derived from a given audio stream, or associated with a given stream of video, audio, or images. In this paper we give an efficient algorithm for achieving such an alignment, and demonstrate its use with two applications. This result is an application of the new framework of Cross-Modal Information Retrieval recently developed at Dartmouth.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Owen, C.B. and Makedon, F.: Cross-Modal Retrieval of Scripted Speech Audio. In: Proc. of SPIE Multimedia Computing and Networking, San Jose, CA (1998) to appearGoogle Scholar
  2. 2.
    Dagan, I., Pereira, F., and Lee, L.: Similarity-Based Estimation of Word Cooccurrence Probabilities. In: Proc. of the 32nd Annual Meeting of the Assoc. for Computational Linguistics, ACL’94, New Mexico State University, Las Cruces, NM (1994)Google Scholar
  3. 3.
    Chen, T., Graf, H.P., and Wang, K.: Lip Synchronization Using Speech-Assisted Video Processing. IEEE Signal Proc. Letters 2 (1995) 57–59CrossRefGoogle Scholar
  4. 4.
    Bloom, P.J.: High-Quality Digital Audio in the Entertainment Industry: An Overview of Achievements and Challenges. IEEE ASSP Magazine 2 (1995) 2–25CrossRefGoogle Scholar
  5. 5.
    Brown, M.G., Foote, J.T., Jones, G.J.F., Spärck Jones, K., and Young, S.J.: Video Mail Retrieval by Voice: An Overview of the Cambridge/Olivetti Retrieval System. In: Proc. of the ACM Multimedia’ 94 Workshop on Multimedia Database Management Systems, San Francisco, CA (1994) 47–55Google Scholar
  6. 6.
    Ballerini, J.-P., Büchel, M., Domenig, R., Knaus, D., Mateev, B., Mittendorf, E., Schäuble, P., Sheridan, P., and Wechsler, M.: SPIDER Retrieval System at TREC-5. In: Proc. of TREC-5 (1996)Google Scholar
  7. 7.
    Hauptmann, A.G., Witbrock, M.J., Rudnicky, A.I., and Reed, S.: Speech for Multimedia Information Retrieval. In: Proc. of User Interface Software and Technology UIST-95, Pittsburg, PA (1995)Google Scholar
  8. 8.
    Gibbs, S., Breiteneder, C., and Tsichritzis, D.: Modeling Time-Based Media. The Handbook of Multimedia Information Management. Prentice Hall PTR (1997) 13–38.Google Scholar
  9. 9.
    Bonhomme, P., and Romary, L.: The Lingua Parallel Concordancing Project: Managing Multilingual Texts for Educational Purposes. In: Proc. of Language Engineering 95, Montpellier, France (1995)Google Scholar
  10. 10.
    Church, K.W.: Char_Align: A Program for Aligning Parallel Texts at the Character Level. In: Proc. of the 30th Annual Meeting of the Assoc. for Computational Linguistics, ACL’93, Columbus, OH (1993)Google Scholar
  11. 11.
    Makedon, F., Owen,, M., and Owen, C.: Multimedia-Data Access Remote Prototype for Ancient Texts. In: Proc. of ED-MEDIA 98, Freiburg, Germany (1998)Google Scholar
  12. 12.
    Owen, C.B.: Multiple Media Correlation: Theory and Applications. Ph.D. thesis, Dartmouth College Dept. of Computer Science (1998)Google Scholar
  13. 13.
    Melamed, I.D.: A Portable Algorithm for Mapping Bitext Correspondence. In: Proc. of the 35th Conference of the Assoc. for Computational Linguistics, ACL’97, Madrid, Spain (1997)Google Scholar
  14. 14.
    Rigau, G., and Agirre, E.: Disambiguating Bilingual Nominal Entries Against WordNet. In: Proc. of the Workshop on the Computational Lexicon, ESSLLI’95 (1995)Google Scholar
  15. 15.
    Fung, P., and McKeown, K.: Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Proc. of the 1st Conf. of the Assoc. for Machine Translation in the Americas, AMTA-94, Columbia, Maryland (1994)Google Scholar
  16. 16.
    Kabir, A.S.: Identifying And Encoding Correlations Across Multiple Documents. DEVLAB Research Report, Dartmouth College (1997)Google Scholar
  17. 17.
    Fung, P., and Church, K.W.: K-vec: A New Approach for Aligning Parallel Texts. In: Proc. of the 15th Int. Conf. on Computational Linguistics COLING’94„ Kyoto, Japan, (1994) 1096–1102Google Scholar
  18. 18.
    Homer: The Odyssey. Translated by Samuel Butler.Google Scholar
  19. 19.
    Homer: The Odyssey. Translated by George Chapman.Google Scholar
  20. 20.
    Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. Report 96-22, IRCS (1996)Google Scholar
  21. 21.
    van der Eijk, P.: Comparative Discourse Analysis of Parallel Texts. Unpublished manuscript (1994)Google Scholar
  22. 22.
    Salton, G.: Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series, New York (1982)Google Scholar
  23. 23.
    Richard Beckwith, George A. Miller, and Randee Tengi. Design and Implementation of the Wordnet Lexical Database and Searching Software. Report, Princeton University Cognitive Science Laboratory (1993)Google Scholar
  24. 24.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K.: Introduction to WordNet: An On-line Lexical Database (revised). CSL Report 43, Princeton University Cognitive Science Laboratory (1993)Google Scholar
  25. 25.
    Cormen, T.H., Leiserson, C.E., and Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge, MA (1990)MATHGoogle Scholar
  26. 26.
    Owen, C.B.: The Imagetcl Multimedia Algorithm Development System. In: Proc. of the 5th Annual Tcl/Tk Workshop’97, Boston, MA (1997) 97–105Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Charles B. Owen
    • 1
  • James Ford
    • 2
  • Fillia Makedon
    • 2
  • Tilmann Steinberg
    • 2
  1. 1.Michigan State UniversityUSA
  2. 2.Dartmouth Experimental Visualization Laboratory 6211 Sudikoff LaboratoryDartmouth CollegeHanover

Personalised recommendations