Skip to main content
Log in

Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

WEBSOM is a recently developed neural method for exploring full-text document collections, for information retrieval, and for information filtering. In WEBSOM the full-text documents are encoded as vectors in a document space somewhat like in earlier information retrieval methods, but in WEBSOM the document space is formed in an unsupervised manner using the Self-Organizing Map algorithm. In this article the document representations the WEBSOM creates are shown to be computationally efficient approximations of the results of a certain probabilistic model. The probabilistic model incorporates information about the similarity of use of different words to take into account their semantic relations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Canada)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1983.

    Google Scholar 

  2. T. Honkela, S. Kaski, K. Lagus and T. Kohonen, “Newsgroup exploration with WEBSOM method and browsing interface”, Helsinki Univ. Tech., Lab. of Computer and Information Science, Espoo, Finland, Technical Report A32, 1996.

    Google Scholar 

  3. T. Kohonen, S. Kaski, K. Lagus and T. Honkela, “Very large two-level SOM for the browsing of newsgroups”, in C. von der Malsburg, W. von Seelen, J.C. Vorbrüggen and B. Sendhoff (eds) Proc. ICANN96, Int. Conf. on Artificial Neural Networks, Bochum, Germany, pp. 269–274, Springer: Berlin, 1996.

    Google Scholar 

  4. H. Ritter and T. Kohonen, “Self-organizing semantic maps”, Biological Cybernetics, Vol. 61, pp. 241–254, 1989.

    Google Scholar 

  5. T. Kohonen, Self-Organizing Maps, Springer: Berlin, 1995.

    Google Scholar 

  6. S. Deerwester, S.T. Dumais, G.W. Furnas and T.K. Landauer, “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, Vol. 41, pp. 391–407, 1990.

    Google Scholar 

  7. S.I. Gallant, W.R. Caid, J. Carleton, R. Hecht-Nielsen, K. Pu Qing and D. Sudbeck, “HNC's MatchPlus system”, ACM SIGIR Forum, Vol. 26, No. 2, pp. 34–38, 1992.

    Google Scholar 

  8. S.I. Gallant, “Methods for generating or revising context vectors for a plurality of word stems”, U.S. Patent number 5,325,298, 1994.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaski, S. Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method. Neural Processing Letters 5, 69–81 (1997). https://doi.org/10.1023/A:1009618125967

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009618125967

Navigation