Skip to main content

Random Indexing Explained with High Probability

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

Random indexing (RI) is an incremental method for constructing a vector space model (VSM) with a reduced dimensionality. Previously, the method has been justified using the mathematical framework of Kanerva’s sparse distributed memory. This justification, although intuitively plausible, fails to provide the information that is required to set the parameters of the method. In order to suggest criteria for the method’s parameters, the RI method is revisited and described using the principles of linear algebra and sparse random projections in Euclidean spaces. These simple mathematics are then employed to suggest criteria for setting the method’s parameters and to explain their influence on the estimated distances in the RI-constructed VSMs. The empirical results observed in an evaluation are reported to support the suggested guidelines in the paper.

The first three pages previously appeared in [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. QasemiZadeh, B.: Random indexing revisited. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 437–442. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  2. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  4. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  6. Brand, M.: Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications 415(1), 20–30 (2006). Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems

    Article  MathSciNet  MATH  Google Scholar 

  7. Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Beals, R., Beck, A., Bellow, A., Hajian, A. (eds.) Conference on Modern Analysis and Probability (1982: Yale University). Contemporary Mathematics, vol. 26. American Mathematical Society, pp. 189–206 (1984)

    Google Scholar 

  8. Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22(1), 60–65 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 274–281. ACM, Santa Barbara, May 2001

    Google Scholar 

  10. Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM, New York (2006)

    Google Scholar 

  11. Kanerva, P., Kristoferson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Gleitman, L.R., Josh, A.K. (eds.) Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Erlbaum, Mahwah (2000)

    Google Scholar 

  12. Sahlgren, M.: An introduction to random indexing. Technical report, Swedish ICT (SICS) (2005). Retrived from https://www.sics.se/~mange/papers/RI_intro.pdf

  13. Lupu, M.: On the usability of random indexing in patent retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 202–216. Springer, Heidelberg (2014)

    Google Scholar 

  14. Polajnar, T., Clark, S.: Improving distributional semantic vectors through context selection and normalisation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Gothenburg, pp. 230–238, April 2014

    Google Scholar 

  15. Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora. Language Resources and Evaluation 43(3), 209–226 (2009)

    Article  Google Scholar 

  16. Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in L1. Journal of the ACM 52(5), 766–788 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics. Association for Computational Linguistics, Sofia, pp. 66–74, August 2013

    Google Scholar 

  18. Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM 53(3), 307–323 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  19. Li, P., Samorodnitsk, G., Hopcroft, J.: Sign Cauchy projections and chi-square kernel. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2571–2579. Curran Associates, Inc. (2013)

    Google Scholar 

  20. Geva, S., De Vries, C.M.: TOPSIG: topology preserving document signatures. In: Berendt, B., de Vries, A., Fan, W., Macdonald, C., Ounis, I., Ruthven, I. (eds.) Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 333–338. ACM, Glasgow (2011)

    Google Scholar 

  21. Zadeh, B.Q., Handschuh, S.: Random Manhattan integer indexing: incremental L1 normed vector space construction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1713–1723. Association for Computational Linguistics, Doha (2014)

    Google Scholar 

  22. Zadeh, B.Q., Handschuh, S.: Random Manhattan indexing. In: 25th International Workshop on Database and Expert Systems Applications (DEXA), pp. 203–208. IEEE, Munich (2014)

    Google Scholar 

  23. Baroni, M., Lenci, A., Onnis, L.: ISA meets Lara: an incremental word space model for cognitively plausible simulations of semantic learning. In: Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition, pp. 49–56. Association for Computational Linguistics, Prague, June 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behrang QasemiZadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

QasemiZadeh, B., Handschuh, S. (2015). Random Indexing Explained with High Probability. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics