Skip to main content

Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department

  • Conference paper
  • First Online:
Book cover Intelligence and Security Informatics (ISI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2665))

Included in the following conference series:

Abstract

The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bates, M. J. “Subject access in online catalogs: A design model”. Journal of the American Society for Information Science, 37, 357–376. (1986)

    Google Scholar 

  2. Chen, H., Lynch, K. J., “Automatic construction of networks of concepts characterizing document database” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 5, pp. 885–902, Sept–Oct (1992)

    Article  Google Scholar 

  3. Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., Lin, C., “A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 771–782, August (1996)

    Article  Google Scholar 

  4. Chen, H., Ng, T., Martinez, J., Schatz, B., “A Concept Space Appraoch to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System” In Journal of The American Society for Information Science, 48(1):17–31. (1997)

    Article  Google Scholar 

  5. Chien, L. F., “PAT-Tree-BASED Keyword Extraction for Chinese Information Retrieval”, In Proceedings of ACM SIGIR, pp. 50–58, Philadelphia, PA, 1997.

    Google Scholar 

  6. Courtial, J. P. and Pomian, J. “A system based on associational logic for the interrogation of databases”, In Journal of Information Science, 13, 91–97, 1987

    Article  Google Scholar 

  7. Cunliffe, D., Jones, H., Jarvis, M., Egan, K., Huws, R., Munro, S., “Information Architecture for Bilingual Web Sites”. In Journal of The American Society for Information Science, 53(10): 866–873. 2002

    Article  Google Scholar 

  8. Ekmekcioglu, F. C., Robertson, A. M. and Willett, P. “Effectiveness of query expansion in ranked-output document retrieval systems”, In Journal of Information Science, 18, 139–147, 1992.

    Article  Google Scholar 

  9. Fung, P. and McKeown, K. (1997) “A technical word-and term-translation aid using noisy parallel corpora across language groups”. In Machine Translation 12: 53–87.

    Article  Google Scholar 

  10. Hayes-Roth, F., Waterman, D. A. and Lenat, D. (1983) “Building Expert Systems”. Reading, MA: Addison-Wesley.

    Google Scholar 

  11. He, S. “Translingual Alteration of Conceptual Information in Medical Translation: A Cross-Language Analysis between English and Chinese,” Journal of the American Society for Information Science, Vol. 51, No. 11, 2000, pp. 1047–1060.

    Article  Google Scholar 

  12. Larson, M. L. Meaning-based translation: A guide to cross-language equivalence. Lanham, MD: University Press of American

    Google Scholar 

  13. Leonardi, V., “Equivalence in Translation: Between Myth and Reality,” Translation Journal, Vol. 4, No. 4, 2000.

    Google Scholar 

  14. Lesk, M. E. (1969) “Word-word associations in document retrieval systems”, In American Documentation, 20(1), 27–38, 1969.

    Article  Google Scholar 

  15. Lin, C. H., Chen, H., “An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents” IEEE Transactions on Systems, Man and Cybernetics, vol 26, no. 1, pp. 75–88, Feb 1996

    Article  Google Scholar 

  16. Ma X. and Liberman M. (1999) “BITS: A Method for Bilingual Text Search over the Web”. In Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore.

    Google Scholar 

  17. Oard, D. W., & Dorr, B. J. (1996). A Survey of Multilingual Text Retrieval. UMIACS-TR-96-19 CS-TR-3815.

    Google Scholar 

  18. Oard, D. W. (1997). Alternative approaches for cross-language text retrieval. In Hull D, Oard D, (Eds.), 1997 AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence.

    Google Scholar 

  19. Resnik P. “Mining the Web for Bilingual Text,” 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, Maryland, June, 1999.

    Google Scholar 

  20. Rose, M. G. (1981). Translation Types and Conventions. In Translation Spectrum: Essays in Theory and Practice, Marilyn Gaddis Rose, Ed., State University of New York Press, pp. 31–33.

    Google Scholar 

  21. Salton, G. (1989) Automatic Text Processing. Addison-Wesley Publishing Company, Inc., Reading, MA, 1989.

    Google Scholar 

  22. Simard, M. (1999) “Text-translation Alignment: Three Languages Are Better Than Two”. In Proceedings of EMNLP/VLC-99. College Park, MD.

    Google Scholar 

  23. Yang, C. C., Luk, J., Yung, S., Yen, J., (2000) “Combination and Boundary Detection Approach for Chinese Indexing,” In Journal of the American Society for Information Science, Special Topic Issue on Digital Libraries, vol. 51, no.4, March, 2000, pp. 340–351.

    Google Scholar 

  24. Yang, C. C. and Li, K. W. “Automatic Construction of English/Chinese Parallel Corpora,” Journal of the American Society for Information Science and Technology, vol. 54, no.7, May, 2003.

    Google Scholar 

  25. Zanettin, F,. “Bilingual comparable corpora and the training of translators,” Laviosa, Sara. (ed.) META, 43:4, Special Issue. The corpus-based approach: a new paradigm in translation studies: 616–630, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, K.W., Yang, C.C. (2003). Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C., Schroeder, J., Madhusudan, T. (eds) Intelligence and Security Informatics. ISI 2003. Lecture Notes in Computer Science, vol 2665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44853-5_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-44853-5_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40189-6

  • Online ISBN: 978-3-540-44853-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics