Abstract
The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bates, M. J. “Subject access in online catalogs: A design model”. Journal of the American Society for Information Science, 37, 357–376. (1986)
Chen, H., Lynch, K. J., “Automatic construction of networks of concepts characterizing document database” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 5, pp. 885–902, Sept–Oct (1992)
Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., Lin, C., “A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 771–782, August (1996)
Chen, H., Ng, T., Martinez, J., Schatz, B., “A Concept Space Appraoch to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System” In Journal of The American Society for Information Science, 48(1):17–31. (1997)
Chien, L. F., “PAT-Tree-BASED Keyword Extraction for Chinese Information Retrieval”, In Proceedings of ACM SIGIR, pp. 50–58, Philadelphia, PA, 1997.
Courtial, J. P. and Pomian, J. “A system based on associational logic for the interrogation of databases”, In Journal of Information Science, 13, 91–97, 1987
Cunliffe, D., Jones, H., Jarvis, M., Egan, K., Huws, R., Munro, S., “Information Architecture for Bilingual Web Sites”. In Journal of The American Society for Information Science, 53(10): 866–873. 2002
Ekmekcioglu, F. C., Robertson, A. M. and Willett, P. “Effectiveness of query expansion in ranked-output document retrieval systems”, In Journal of Information Science, 18, 139–147, 1992.
Fung, P. and McKeown, K. (1997) “A technical word-and term-translation aid using noisy parallel corpora across language groups”. In Machine Translation 12: 53–87.
Hayes-Roth, F., Waterman, D. A. and Lenat, D. (1983) “Building Expert Systems”. Reading, MA: Addison-Wesley.
He, S. “Translingual Alteration of Conceptual Information in Medical Translation: A Cross-Language Analysis between English and Chinese,” Journal of the American Society for Information Science, Vol. 51, No. 11, 2000, pp. 1047–1060.
Larson, M. L. Meaning-based translation: A guide to cross-language equivalence. Lanham, MD: University Press of American
Leonardi, V., “Equivalence in Translation: Between Myth and Reality,” Translation Journal, Vol. 4, No. 4, 2000.
Lesk, M. E. (1969) “Word-word associations in document retrieval systems”, In American Documentation, 20(1), 27–38, 1969.
Lin, C. H., Chen, H., “An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents” IEEE Transactions on Systems, Man and Cybernetics, vol 26, no. 1, pp. 75–88, Feb 1996
Ma X. and Liberman M. (1999) “BITS: A Method for Bilingual Text Search over the Web”. In Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore.
Oard, D. W., & Dorr, B. J. (1996). A Survey of Multilingual Text Retrieval. UMIACS-TR-96-19 CS-TR-3815.
Oard, D. W. (1997). Alternative approaches for cross-language text retrieval. In Hull D, Oard D, (Eds.), 1997 AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence.
Resnik P. “Mining the Web for Bilingual Text,” 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, Maryland, June, 1999.
Rose, M. G. (1981). Translation Types and Conventions. In Translation Spectrum: Essays in Theory and Practice, Marilyn Gaddis Rose, Ed., State University of New York Press, pp. 31–33.
Salton, G. (1989) Automatic Text Processing. Addison-Wesley Publishing Company, Inc., Reading, MA, 1989.
Simard, M. (1999) “Text-translation Alignment: Three Languages Are Better Than Two”. In Proceedings of EMNLP/VLC-99. College Park, MD.
Yang, C. C., Luk, J., Yung, S., Yen, J., (2000) “Combination and Boundary Detection Approach for Chinese Indexing,” In Journal of the American Society for Information Science, Special Topic Issue on Digital Libraries, vol. 51, no.4, March, 2000, pp. 340–351.
Yang, C. C. and Li, K. W. “Automatic Construction of English/Chinese Parallel Corpora,” Journal of the American Society for Information Science and Technology, vol. 54, no.7, May, 2003.
Zanettin, F,. “Bilingual comparable corpora and the training of translators,” Laviosa, Sara. (ed.) META, 43:4, Special Issue. The corpus-based approach: a new paradigm in translation studies: 616–630, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, K.W., Yang, C.C. (2003). Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C., Schroeder, J., Madhusudan, T. (eds) Intelligence and Security Informatics. ISI 2003. Lecture Notes in Computer Science, vol 2665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44853-5_11
Download citation
DOI: https://doi.org/10.1007/3-540-44853-5_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40189-6
Online ISBN: 978-3-540-44853-2
eBook Packages: Springer Book Archive