Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2665)


The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.


Concept Space Parallel Corpus Chinese Translation Automatic Construction Hopfield Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bates, M. J. “Subject access in online catalogs: A design model”. Journal of the American Society for Information Science, 37, 357–376. (1986)Google Scholar
  2. 2.
    Chen, H., Lynch, K. J., “Automatic construction of networks of concepts characterizing document database” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 5, pp. 885–902, Sept–Oct (1992)CrossRefGoogle Scholar
  3. 3.
    Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., Lin, C., “A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 771–782, August (1996)CrossRefGoogle Scholar
  4. 4.
    Chen, H., Ng, T., Martinez, J., Schatz, B., “A Concept Space Appraoch to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System” In Journal of The American Society for Information Science, 48(1):17–31. (1997)CrossRefGoogle Scholar
  5. 5.
    Chien, L. F., “PAT-Tree-BASED Keyword Extraction for Chinese Information Retrieval”, In Proceedings of ACM SIGIR, pp. 50–58, Philadelphia, PA, 1997.Google Scholar
  6. 6.
    Courtial, J. P. and Pomian, J. “A system based on associational logic for the interrogation of databases”, In Journal of Information Science, 13, 91–97, 1987CrossRefGoogle Scholar
  7. 7.
    Cunliffe, D., Jones, H., Jarvis, M., Egan, K., Huws, R., Munro, S., “Information Architecture for Bilingual Web Sites”. In Journal of The American Society for Information Science, 53(10): 866–873. 2002CrossRefGoogle Scholar
  8. 8.
    Ekmekcioglu, F. C., Robertson, A. M. and Willett, P. “Effectiveness of query expansion in ranked-output document retrieval systems”, In Journal of Information Science, 18, 139–147, 1992.CrossRefGoogle Scholar
  9. 9.
    Fung, P. and McKeown, K. (1997) “A technical word-and term-translation aid using noisy parallel corpora across language groups”. In Machine Translation 12: 53–87.CrossRefGoogle Scholar
  10. 10.
    Hayes-Roth, F., Waterman, D. A. and Lenat, D. (1983) “Building Expert Systems”. Reading, MA: Addison-Wesley.Google Scholar
  11. 11.
    He, S. “Translingual Alteration of Conceptual Information in Medical Translation: A Cross-Language Analysis between English and Chinese,” Journal of the American Society for Information Science, Vol. 51, No. 11, 2000, pp. 1047–1060.CrossRefGoogle Scholar
  12. 12.
    Larson, M. L. Meaning-based translation: A guide to cross-language equivalence. Lanham, MD: University Press of AmericanGoogle Scholar
  13. 13.
    Leonardi, V., “Equivalence in Translation: Between Myth and Reality,” Translation Journal, Vol. 4, No. 4, 2000.Google Scholar
  14. 14.
    Lesk, M. E. (1969) “Word-word associations in document retrieval systems”, In American Documentation, 20(1), 27–38, 1969.CrossRefGoogle Scholar
  15. 15.
    Lin, C. H., Chen, H., “An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents” IEEE Transactions on Systems, Man and Cybernetics, vol 26, no. 1, pp. 75–88, Feb 1996CrossRefGoogle Scholar
  16. 16.
    Ma X. and Liberman M. (1999) “BITS: A Method for Bilingual Text Search over the Web”. In Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore.Google Scholar
  17. 17.
    Oard, D. W., & Dorr, B. J. (1996). A Survey of Multilingual Text Retrieval. UMIACS-TR-96-19 CS-TR-3815.Google Scholar
  18. 18.
    Oard, D. W. (1997). Alternative approaches for cross-language text retrieval. In Hull D, Oard D, (Eds.), 1997 AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence.Google Scholar
  19. 19.
    Resnik P. “Mining the Web for Bilingual Text,” 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, Maryland, June, 1999.Google Scholar
  20. 20.
    Rose, M. G. (1981). Translation Types and Conventions. In Translation Spectrum: Essays in Theory and Practice, Marilyn Gaddis Rose, Ed., State University of New York Press, pp. 31–33.Google Scholar
  21. 21.
    Salton, G. (1989) Automatic Text Processing. Addison-Wesley Publishing Company, Inc., Reading, MA, 1989.Google Scholar
  22. 22.
    Simard, M. (1999) “Text-translation Alignment: Three Languages Are Better Than Two”. In Proceedings of EMNLP/VLC-99. College Park, MD.Google Scholar
  23. 23.
    Yang, C. C., Luk, J., Yung, S., Yen, J., (2000) “Combination and Boundary Detection Approach for Chinese Indexing,” In Journal of the American Society for Information Science, Special Topic Issue on Digital Libraries, vol. 51, no.4, March, 2000, pp. 340–351.Google Scholar
  24. 24.
    Yang, C. C. and Li, K. W. “Automatic Construction of English/Chinese Parallel Corpora,” Journal of the American Society for Information Science and Technology, vol. 54, no.7, May, 2003.Google Scholar
  25. 25.
    Zanettin, F,. “Bilingual comparable corpora and the training of translators,” Laviosa, Sara. (ed.) META, 43:4, Special Issue. The corpus-based approach: a new paradigm in translation studies: 616–630, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongHong Kong

Personalised recommendations