Abstract
WEBSOM is a recently developed neural method for exploring full-text document collections, for information retrieval, and for information filtering. In WEBSOM the full-text documents are encoded as vectors in a document space somewhat like in earlier information retrieval methods, but in WEBSOM the document space is formed in an unsupervised manner using the Self-Organizing Map algorithm. In this article the document representations the WEBSOM creates are shown to be computationally efficient approximations of the results of a certain probabilistic model. The probabilistic model incorporates information about the similarity of use of different words to take into account their semantic relations.
Similar content being viewed by others
References
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1983.
T. Honkela, S. Kaski, K. Lagus and T. Kohonen, “Newsgroup exploration with WEBSOM method and browsing interface”, Helsinki Univ. Tech., Lab. of Computer and Information Science, Espoo, Finland, Technical Report A32, 1996.
T. Kohonen, S. Kaski, K. Lagus and T. Honkela, “Very large two-level SOM for the browsing of newsgroups”, in C. von der Malsburg, W. von Seelen, J.C. Vorbrüggen and B. Sendhoff (eds) Proc. ICANN96, Int. Conf. on Artificial Neural Networks, Bochum, Germany, pp. 269–274, Springer: Berlin, 1996.
H. Ritter and T. Kohonen, “Self-organizing semantic maps”, Biological Cybernetics, Vol. 61, pp. 241–254, 1989.
T. Kohonen, Self-Organizing Maps, Springer: Berlin, 1995.
S. Deerwester, S.T. Dumais, G.W. Furnas and T.K. Landauer, “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, Vol. 41, pp. 391–407, 1990.
S.I. Gallant, W.R. Caid, J. Carleton, R. Hecht-Nielsen, K. Pu Qing and D. Sudbeck, “HNC's MatchPlus system”, ACM SIGIR Forum, Vol. 26, No. 2, pp. 34–38, 1992.
S.I. Gallant, “Methods for generating or revising context vectors for a plurality of word stems”, U.S. Patent number 5,325,298, 1994.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kaski, S. Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method. Neural Processing Letters 5, 69–81 (1997). https://doi.org/10.1023/A:1009618125967
Issue Date:
DOI: https://doi.org/10.1023/A:1009618125967