Building Specialized Multilingual Lexical Graphs Using Community Resources

  • Mohammad Daoud
  • Christian Boitet
  • Kyo Kageura
  • Asanobu Kitamoto
  • Mathieu Mangeot
  • Daoud Daoud
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6162)


We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users’ behaviors to extract interesting patterns and facts (implicit approach). As a generic repository that can handle the collected multilingual terminological data, we are describing the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain. We call it preterminological, because it is a raw material that can be used to build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.


Machine Translation Silk Road Initial Graph Lexical Unit Search Session 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cabré, M.T., Sager, J.C.: Terminology: Theory, methods and applications. J. Benjamins Pub. Co. xii, 247 (1999)Google Scholar
  2. 2.
    Kageura, K.: The Dynamics of Terminology: A descriptive theory of term formation and terminological growth. Terminology and Lexicography Research and Practice 5, 322 (2002)Google Scholar
  3. 3.
    IATE. Inter-Active Terminology for Europe (2008), (cited 2008 10/10/2008)
  4. 4.
    UN. United Nations Multilingual Terminology Database (2008), (cited 2008 10/10/2008)
  5. 5.
    IEC. Electropedia (2008), (cited 2008 10/10/2008)
  6. 6.
    Gopestake, A., et al.: Acquisition of lexical translation relations from MRDS. Machine Translation 9(3-4), 183–219 (1994)CrossRefGoogle Scholar
  7. 7.
    Helmreich, S., Guthrie, L., Wilks, Y.A.: The use of machine readable dictionaries in the PANGLOSS project. In: Proceedings of the AAAI Spring Symposium on Building Lexicons for Machine Translation, Stanford Univ., Stanford (1993)Google Scholar
  8. 8.
    Etzioni, O., et al.: Lexical translation with application to image searching on the web. In: MT Summit XI, Copenhagen, Denmark (2007)Google Scholar
  9. 9.
    Anh, L.V.: Human Computation. In: Computer Science, p. 87. Carnegie Mellon University, Pittsburgh (2005)Google Scholar
  10. 10.
    Ono, K., et al.: Memory of the Silk Road -The Digital Silk Road Project. In: Proceedings of (VSMM 2008), Project Papers, Limassol, Cyprus (2008)Google Scholar
  11. 11.
    Wikipedia. Wikipedia (2008), (cited 2008 June 1, 2008)
  12. 12.
    Google. Google Translate (2008), (cited 2008 June 1, 2008)
  13. 13.
    Jones, G.J.F., et al.: Domain-Specific Query Translation for Multilingual Information Access Using Machine Translation Augmented With Dictionaries Mined From Wikipedia. In: Proceedings (CLIA 2008), Hydrabad, India (2008)Google Scholar
  14. 14.
    NII. Digital Silk Road (2003), (cited 2008 1/9/2008)
  15. 15.
    NII. Digital Archive of Toyo Bunko Rare Books (2008), (cited 2008 June 1, 2008)
  16. 16.
    Daoud, M., et al.: A CLIR-Based Collaborative Construction of Multilingual Terminological Dictionary for Cultural Resources. In: Translating and the Computer, London-UK, vol. 30 (2008)Google Scholar
  17. 17.
    Stermsek, G., Strembeck, M., Neumann, G.: A User Profile Derivation Approach based on Log-File Analysis. In: Arabnia, H.R., Hashemi, R.R. (eds.) IKE. CSREA Press (2007)Google Scholar
  18. 18.
    Chen, A.: Cross-Language Retrieval Experiments at CLEF 2002. In: CLEF 2002 working notes (2002)Google Scholar
  19. 19.
    Oard, D.: Global Access to Multilingual Information. In: Fourth International Workshop on Information Retrieval with Asian Languages, Taipei-Taiwan (1999)Google Scholar
  20. 20.
    Wikitionary. Wikitionary (2008), (cited 2008 1/9/2008)
  21. 21.
    Babylon. Babylon Dictionary (2009), (cited 2009 5/5/2009)
  22. 22.
    Daoud, D., Al Ahlam, T.: (2010), (cited 2010)Google Scholar
  23. 23.
    Ahn, L.v.: Games With A Purpose. IEEE Computer Magazine, 96–98 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mohammad Daoud
    • 1
  • Christian Boitet
    • 1
  • Kyo Kageura
    • 2
  • Asanobu Kitamoto
    • 3
  • Mathieu Mangeot
    • 1
  • Daoud Daoud
    • 4
  1. 1.Grenoble Informatics Laboratory, GETALPUniversité Joseph FourierGrenobleFrance
  2. 2.Library and Information Science Laboratory, Graduate School of EducationThe University of TokyoTokyoJapan
  3. 3.The National Institute of InformaticsTokyo
  4. 4.Princess Sumaya UniversityAl-Jubaiha

Personalised recommendations