A Multilingual Text Mining Approach Based on Self-Organizing Maps

Lee, Chung-Hong; Yang, Hsin-Chang

doi:10.1023/A:1023250105036

A Multilingual Text Mining Approach Based on Self-Organizing Maps

Published: May 2003

Volume 18, pages 295–310, (2003)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chung-Hong Lee¹ &
Hsin-Chang Yang¹

245 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

This paper describes our work on developing a language-independent technique for discovery of implicit knowledge from multilingual information sources. Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus only on processing monolingual documents (particularly English documents): little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. In this work, we attempt to develop a language-neutral method to tackle the linguistics difficulties in the text mining process. Using a variation of automatic clustering techniques, which apply a neural net approach, namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on a Chinese corpus, Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential paths for future work in the field of multilingual information discovery. Besides, this work is expected to act as a starting point for exploring the impacts on linguistics issues with the machine-learning approach to mining sensible linguistics elements from multilingual text collections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

I. Dagan, R. Feldman, and H. Hirsh, “Keyword-based browsing and analysis of large document sets,” in Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, NV, 1996, pp. 191–208.
R. Feldman and I. Dagan, “KDT—knowledge discovery in texts,” in Proceedings of the First Annual Conference on Knowledge Discovery and Data Mining (KDD), AAAI Press: Montreal, 1995, pp. 112–117.
Google Scholar
R. Feldman,W. Klosgen, and A. Zilberstein, “Visualization techniques to explore data mining results for document collections,” in Proc. Third Annual Conference on Knowledge Discovery and Data Mining (KDD), Newport Beach, 1997, pp. 16–23.
R. Feldman, I. Dagan, and H. Hirsh, “Mining text using keyword distributions,” Journal of Intelligent Information Systems, vol. 10, pp. 281–300, 1998.
Google Scholar
T. Honkela, S. Kaski, K. Lagus, and T. Kohonen, “Newsgroup exploration with WEBSOM method and browsing interface,” Laboratory of Computer and Information Science, Helsinki University of Technology, Technical Report A32, Espoo, Finland, 1996.
T. Kohonen, “Self-organization of very large document collections: State of the art,” in Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, edited by L. Niklasson, M. Boden, and T. Ziemke, London, Springer, 1998, vol. 1, pp. 65–74.
Google Scholar
S. Kaski, T. Honkela, K. Lagus, and T. Kohonen, “WEBSOM—self-organizing maps of document collections,” Neurocomputing, vol. 21, pp. 101–117, 1998.
Google Scholar
T. Kohonen, “Self-organizing formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.
Google Scholar
T. Kohonen, Self-Organizing Maps, Springer-Verlag: Berlin, 1995.
Google Scholar
M.A. Hearst, “Untangling text data mining,” in Proceedings of ACL’99: The 37th Annual Meeting of Association for Computational Linguistics, University of Maryland, 1999, pp. 20–26.
X. Lin, D. Soergel, and G. Marchionini, “A self-organizing semantic map for information retrieval,” in Proceedings of the ACM SIGIR Int’l Conf. on Research and Development in Information Retrieval (SIGIR’91), Chicago, IL, 1991, pp. 262–269.
H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biological Cybernetics, vol. 61, 1989, pp. 241–254.
Google Scholar
C.H. Lee and H.C. Yang, “A web text mining approach based on self-organizing map,” in Proceedings of the ACM CIKM’99 2nd Workshop on Web Information and Data Management (WIDM’99), Kansas City, Missouri, USA, 1999, pp. 59–62.
C.H. Lee and H.C. Yang, “A text data mining approach using a Chinese corpus based on self-organizing map,” in Proceedings of the Fourth International Workshop on Information Retrieval with Asian Language (IRAL’99), Taipei, Taiwan, 1999, pp. 19–22.
C.H. Lee and H.C. Yang, “Towards multilingual information discovery through a SOM based text mining approach,” in Proceedings of InternationalWorkshop on Text andWeb Mining, The Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000), Melbourne, Australia, Aug. 28–Sept. 1, 2000, pp. 81–87.
H.C. Yang and C.H. Lee, “Automatic category generation for text documents by self-organizing maps,” in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, July, 2000, Vol. III-581–586, pp. 24–27.
Google Scholar
H.C. Yang and C.H. Lee, “Automatic category structure generation and categorization of Chinese text documents,” Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France, Sept., 2000, pp. 13–16.
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill Book Company: New York, 1983.
Google Scholar
S. Deerwester, S. Dumais, G. Furnas, and K. Landauer, “Indexing by latent semantic analysis,” Journal of American Society for Information Science, vol. 40, no.6, pp. 391–407, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
Chung-Hong Lee & Hsin-Chang Yang

Authors

Chung-Hong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Chang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, CH., Yang, HC. A Multilingual Text Mining Approach Based on Self-Organizing Maps. Applied Intelligence 18, 295–310 (2003). https://doi.org/10.1023/A:1023250105036

Download citation

Issue Date: May 2003
DOI: https://doi.org/10.1023/A:1023250105036

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multilingual Text Mining Approach Based on Self-Organizing Maps

Abstract

Access this article

Similar content being viewed by others

A review of semi-supervised learning for text classification

A survey on neural topic models: methods, applications, and challenges

A detailed review on word embedding techniques with emphasis on word2vec

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Multilingual Text Mining Approach Based on Self-Organizing Maps

Abstract

Access this article

Similar content being viewed by others

A review of semi-supervised learning for text classification

A survey on neural topic models: methods, applications, and challenges

A detailed review on word embedding techniques with emphasis on word2vec

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation