Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department

Li, Kar Wing; Yang, Christopher C.

doi:10.1007/3-540-44853-5_11

Kar Wing Li⁴ &
Christopher C. Yang⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2665))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

1759 Accesses
2 Citations

Abstract

The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bates, M. J. “Subject access in online catalogs: A design model”. Journal of the American Society for Information Science, 37, 357–376. (1986)
Google Scholar
Chen, H., Lynch, K. J., “Automatic construction of networks of concepts characterizing document database” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 5, pp. 885–902, Sept–Oct (1992)
Article Google Scholar
Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., Lin, C., “A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 771–782, August (1996)
Article Google Scholar
Chen, H., Ng, T., Martinez, J., Schatz, B., “A Concept Space Appraoch to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System” In Journal of The American Society for Information Science, 48(1):17–31. (1997)
Article Google Scholar
Chien, L. F., “PAT-Tree-BASED Keyword Extraction for Chinese Information Retrieval”, In Proceedings of ACM SIGIR, pp. 50–58, Philadelphia, PA, 1997.
Google Scholar
Courtial, J. P. and Pomian, J. “A system based on associational logic for the interrogation of databases”, In Journal of Information Science, 13, 91–97, 1987
Article Google Scholar
Cunliffe, D., Jones, H., Jarvis, M., Egan, K., Huws, R., Munro, S., “Information Architecture for Bilingual Web Sites”. In Journal of The American Society for Information Science, 53(10): 866–873. 2002
Article Google Scholar
Ekmekcioglu, F. C., Robertson, A. M. and Willett, P. “Effectiveness of query expansion in ranked-output document retrieval systems”, In Journal of Information Science, 18, 139–147, 1992.
Article Google Scholar
Fung, P. and McKeown, K. (1997) “A technical word-and term-translation aid using noisy parallel corpora across language groups”. In Machine Translation 12: 53–87.
Article Google Scholar
Hayes-Roth, F., Waterman, D. A. and Lenat, D. (1983) “Building Expert Systems”. Reading, MA: Addison-Wesley.
Google Scholar
He, S. “Translingual Alteration of Conceptual Information in Medical Translation: A Cross-Language Analysis between English and Chinese,” Journal of the American Society for Information Science, Vol. 51, No. 11, 2000, pp. 1047–1060.
Article Google Scholar
Larson, M. L. Meaning-based translation: A guide to cross-language equivalence. Lanham, MD: University Press of American
Google Scholar
Leonardi, V., “Equivalence in Translation: Between Myth and Reality,” Translation Journal, Vol. 4, No. 4, 2000.
Google Scholar
Lesk, M. E. (1969) “Word-word associations in document retrieval systems”, In American Documentation, 20(1), 27–38, 1969.
Article Google Scholar
Lin, C. H., Chen, H., “An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents” IEEE Transactions on Systems, Man and Cybernetics, vol 26, no. 1, pp. 75–88, Feb 1996
Article Google Scholar
Ma X. and Liberman M. (1999) “BITS: A Method for Bilingual Text Search over the Web”. In Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore.
Google Scholar
Oard, D. W., & Dorr, B. J. (1996). A Survey of Multilingual Text Retrieval. UMIACS-TR-96-19 CS-TR-3815.
Google Scholar
Oard, D. W. (1997). Alternative approaches for cross-language text retrieval. In Hull D, Oard D, (Eds.), 1997 AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence.
Google Scholar
Resnik P. “Mining the Web for Bilingual Text,” 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, Maryland, June, 1999.
Google Scholar
Rose, M. G. (1981). Translation Types and Conventions. In Translation Spectrum: Essays in Theory and Practice, Marilyn Gaddis Rose, Ed., State University of New York Press, pp. 31–33.
Google Scholar
Salton, G. (1989) Automatic Text Processing. Addison-Wesley Publishing Company, Inc., Reading, MA, 1989.
Google Scholar
Simard, M. (1999) “Text-translation Alignment: Three Languages Are Better Than Two”. In Proceedings of EMNLP/VLC-99. College Park, MD.
Google Scholar
Yang, C. C., Luk, J., Yung, S., Yen, J., (2000) “Combination and Boundary Detection Approach for Chinese Indexing,” In Journal of the American Society for Information Science, Special Topic Issue on Digital Libraries, vol. 51, no.4, March, 2000, pp. 340–351.
Google Scholar
Yang, C. C. and Li, K. W. “Automatic Construction of English/Chinese Parallel Corpora,” Journal of the American Society for Information Science and Technology, vol. 54, no.7, May, 2003.
Google Scholar
Zanettin, F,. “Bilingual comparable corpora and the training of translators,” Laviosa, Sara. (ed.) META, 43:4, Special Issue. The corpus-based approach: a new paradigm in translation studies: 616–630, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Kar Wing Li & Christopher C. Yang

Authors

Kar Wing Li
View author publications
You can also search for this author in PubMed Google Scholar
Christopher C. Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Management Information Systems, University of Arizona, Tucson, AZ, 85721, USA
Hsinchun Chen , Daniel D. Zeng & Therani Madhusudan , &
Tucson Police Department, 270 S. Stone Ave., Tucson, AZ, 85701, USA
Richard Miranda & Jenny Schroeder &
School of Public Administration and Policy, University of Arizona, Tucson, AZ, 85721, USA
Chris Demchak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, K.W., Yang, C.C. (2003). Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C., Schroeder, J., Madhusudan, T. (eds) Intelligence and Security Informatics. ISI 2003. Lecture Notes in Computer Science, vol 2665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44853-5_11

Download citation

DOI: https://doi.org/10.1007/3-540-44853-5_11
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40189-6
Online ISBN: 978-3-540-44853-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics