Skip to main content
Log in

A Multilingual Text Mining Approach Based on Self-Organizing Maps

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper describes our work on developing a language-independent technique for discovery of implicit knowledge from multilingual information sources. Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus only on processing monolingual documents (particularly English documents): little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. In this work, we attempt to develop a language-neutral method to tackle the linguistics difficulties in the text mining process. Using a variation of automatic clustering techniques, which apply a neural net approach, namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on a Chinese corpus, Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential paths for future work in the field of multilingual information discovery. Besides, this work is expected to act as a starting point for exploring the impacts on linguistics issues with the machine-learning approach to mining sensible linguistics elements from multilingual text collections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. I. Dagan, R. Feldman, and H. Hirsh, “Keyword-based browsing and analysis of large document sets,” in Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, NV, 1996, pp. 191–208.

  2. R. Feldman and I. Dagan, “KDT—knowledge discovery in texts,” in Proceedings of the First Annual Conference on Knowledge Discovery and Data Mining (KDD), AAAI Press: Montreal, 1995, pp. 112–117.

    Google Scholar 

  3. R. Feldman,W. Klosgen, and A. Zilberstein, “Visualization techniques to explore data mining results for document collections,” in Proc. Third Annual Conference on Knowledge Discovery and Data Mining (KDD), Newport Beach, 1997, pp. 16–23.

  4. R. Feldman, I. Dagan, and H. Hirsh, “Mining text using keyword distributions,” Journal of Intelligent Information Systems, vol. 10, pp. 281–300, 1998.

    Google Scholar 

  5. T. Honkela, S. Kaski, K. Lagus, and T. Kohonen, “Newsgroup exploration with WEBSOM method and browsing interface,” Laboratory of Computer and Information Science, Helsinki University of Technology, Technical Report A32, Espoo, Finland, 1996.

  6. T. Kohonen, “Self-organization of very large document collections: State of the art,” in Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, edited by L. Niklasson, M. Boden, and T. Ziemke, London, Springer, 1998, vol. 1, pp. 65–74.

    Google Scholar 

  7. S. Kaski, T. Honkela, K. Lagus, and T. Kohonen, “WEBSOM—self-organizing maps of document collections,” Neurocomputing, vol. 21, pp. 101–117, 1998.

    Google Scholar 

  8. T. Kohonen, “Self-organizing formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.

    Google Scholar 

  9. T. Kohonen, Self-Organizing Maps, Springer-Verlag: Berlin, 1995.

    Google Scholar 

  10. M.A. Hearst, “Untangling text data mining,” in Proceedings of ACL’99: The 37th Annual Meeting of Association for Computational Linguistics, University of Maryland, 1999, pp. 20–26.

  11. X. Lin, D. Soergel, and G. Marchionini, “A self-organizing semantic map for information retrieval,” in Proceedings of the ACM SIGIR Int’l Conf. on Research and Development in Information Retrieval (SIGIR’91), Chicago, IL, 1991, pp. 262–269.

  12. H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biological Cybernetics, vol. 61, 1989, pp. 241–254.

    Google Scholar 

  13. C.H. Lee and H.C. Yang, “A web text mining approach based on self-organizing map,” in Proceedings of the ACM CIKM’99 2nd Workshop on Web Information and Data Management (WIDM’99), Kansas City, Missouri, USA, 1999, pp. 59–62.

  14. C.H. Lee and H.C. Yang, “A text data mining approach using a Chinese corpus based on self-organizing map,” in Proceedings of the Fourth International Workshop on Information Retrieval with Asian Language (IRAL’99), Taipei, Taiwan, 1999, pp. 19–22.

  15. C.H. Lee and H.C. Yang, “Towards multilingual information discovery through a SOM based text mining approach,” in Proceedings of InternationalWorkshop on Text andWeb Mining, The Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000), Melbourne, Australia, Aug. 28–Sept. 1, 2000, pp. 81–87.

  16. H.C. Yang and C.H. Lee, “Automatic category generation for text documents by self-organizing maps,” in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, July, 2000, Vol. III-581–586, pp. 24–27.

    Google Scholar 

  17. H.C. Yang and C.H. Lee, “Automatic category structure generation and categorization of Chinese text documents,” Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France, Sept., 2000, pp. 13–16.

  18. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill Book Company: New York, 1983.

    Google Scholar 

  19. S. Deerwester, S. Dumais, G. Furnas, and K. Landauer, “Indexing by latent semantic analysis,” Journal of American Society for Information Science, vol. 40, no.6, pp. 391–407, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, CH., Yang, HC. A Multilingual Text Mining Approach Based on Self-Organizing Maps. Applied Intelligence 18, 295–310 (2003). https://doi.org/10.1023/A:1023250105036

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023250105036

Navigation