Abstract
The self-organizing map (SOM) can classify documents by learning about their interrelationships from its input data. The dimensionality of the SOM input data space based on a document collection is generally high. As the computational complexity of the SOM increases in proportion to the dimension of its input space, high dimensionality not only lowers the efficiency of the initial learning process but also lowers the efficiencies of the subsequent retrieval and the relearning process whenever the input data is updated. A new method called feature competitive algorithm (FCA) is proposed to overcome this problem. The FCA can capture the most significant features that characterize the underlying interrelationships of the entities in the input space to form a dimensionally reduced input space without excessively losing of essential information about the interrelationships. The proposed method was applied to a document collection, consisting of 97 UNIX command manual pages, to test its feasibility and effectiveness. The test results are encouraging. Further discussions on several crucial issues about the FCA are also presented.
Similar content being viewed by others
References
H. Ritter and T. Kohonen, "Self-organizing semantic maps," Biological Cybernetics, vol. 61, pp. 241–254, 1989.
X. Lin, D. Soergel, and G. Marchionini, "A self-organizing semantic map for information retrieval," in Proc. of 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, IL, pp. 262–269, 1991.
R.E. Orwig, H. Chen, and J.F. Nunamaker, "A graphical, selforganizing approach to classifying electronic meeting output," Journal of the American Society for Information Science, vol. 48, no. 2, pp. 157–170, 1997.
T. Honkela, S. Kaski, K. Lagus, and T. Kohonen, "Newsgroup exploration with WEBSOM method and browsing interface," Helsinki University of Technology, Helsinki, Technical Report, Report A32, 1996.
T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag: Berlin, 1988.
M. Dash and H. Liu, "Feature selection for classification," Intelligent Data Analysis: An International Journal, vol. 1, no. 3, 1997.
J.S. Deogun, S.K. Choubey, V.V. Raghavan, and H. Sever, "Feature selection and effective classifiers," Journal of the American Society for Information Science, vol. 49, no. 5, pp. 423–434, 1998.
H. Almuallim and T.G. Dietterich, "Learning Boolean concepts in the presence of many irrelevant features," Artificial Intelligence, vol. 69, no. 1–2, pp. 279–305, 1994.
M. Modrzejewski, "Feature selection using rough sets theory," in Proc. of the European Conference on Machine Learning, edited by P.B. Brazdil, pp. 213–226, 1993.
H. Liu and W.X.Wen, "Concept learning through feature selection," in Proc. of First Australian and New Zealand Conference on Intelligent Information Systems, pp. 293–297, 1993.
S. Deerwester, S.T. Dumals, G.W. Furnas, T.K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, 1990.
H. Liu and R. Setiono, "Incremental feature selection," Applied Intelligence, vol. 9, pp. 217–230, 1998.
T. Kohonen, Self-Organizing Maps, Springer-Verlag: Berlin, 1997.
H. Ye and B.W.N. Lo, "Toward a self-structuring software library," IEE Proceedings-Software, April, 1999, submitted.
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Publishing Company: Mass., 1989.
H. Ye and B.W.N. Lo, "Input data representation for selforganizing map in software classification," School of Multimedia & Information Technology, Southern Cross University, Lismore, Australia, Technical Report, SOMIT/9901, 1999.
T. Tanimoto, "Undocumented Internal Report," IBM Corp., 1958.
J. Minker, E. Peltola, and G.A.Wilson, Tech. Report, Computer Sci. Centre, College Park, MD, University of Maryland, Report 201, 1972. 19. K. Sparck-Jones, Automatic Keyword Classification and Information Retrieval, Butterworth: London, UK, 1971.
J. Lienard, M. Mlouka, and J. Mariani, In Preprints of the Speech Communication Seminar, Almqvist&Wiksell: Uppsala,Sweden, 1975.
Y.S. Maarek, D.M. Berry, and G.E. Kaiser, "An information retrieval approach for automatically construction software libraries," IEEE Transactions on Software Engineering, vol. 17, no. 8, pp. 800–813, 1991. John Wiley & Sons, Inc.: New York, 1987.
W.B. Frakes, "Introduction to information storage and retrieval systems," in Information Retrieval: Data Structure&Algorithm, Prentice Hall: Englewood Cliffs, New Jersey, pp. 1–12, 1992.
H. Ye and B.W.N. Lo, "A self-classification scheme for software reuse," in Proc. of 17th IASTED International Conference of Applied Informatics, AI'99, Innsbruck, Austria, pp. 358–361, 1999.
J.T. Biggerstaff, Software Reusability, ACM Press: New York, 1992.
H. Schutze, D.A. Hull, and J.O. Pedersen, "Acomparison of classifiers and document representations for the routing problem," in Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
C. Apte, F. Damerau, and S.M. Weiss, "Automated learning of decision rules for text categorization," ACM Transactions on Information Systems, vol. 12, no. 3, pp. 233–251, 1994.
H.T. Ng, W.B. Goh, and K.L. Low, "Feature selection, perceptron learning, and a usability case study for text categorization," in Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACMPress: Philadelphia, Pennsylvania, USA, pp. 67–73, 1997.
H.Ye and B.W.N. Lo, "Avisualized software library:Asoftware self-organizing map," in Proc. of the 6th International Conference on Neural Information Processing, Perth, Australia, pp. 60–65, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ye, H., Lo, B.W. Feature Competitive Algorithm for Dimension Reduction of the Self-Organizing Map Input Space. Applied Intelligence 13, 215–230 (2000). https://doi.org/10.1023/A:1026511926034
Issue Date:
DOI: https://doi.org/10.1023/A:1026511926034

