HOMALS for Dimension Reduction in Information Retrieval

Hildebrand, Kay F.; Müller-Funk, Ulrich

doi:10.1007/978-3-642-24466-7_36

Kay F. Hildebrand⁵ &
Ulrich Müller-Funk⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2500 Accesses
3 Altmetric

Abstract

The usual data base for multiple correspondence analysis/homogeneity analysis consists of objects, characterised by categorical attributes. Its aims and ends are visualisation, dimension reduction and, to some extent, factor analysis using alternating least squares. As for dimension reduction, there are strong parallels between vector-based methods in Information Retrieval (IR) like the Vector Space Model (VSM) or Latent Semantic Analysis (LSA). The latter uses singular value decomposition (SVD) to discard a number of the smallest singular values and that way generates a lower-dimensional retrieval space. In this paper, the HOMALS technique is exploited for use in IR by categorising metric term frequencies in term-document matrices. In this context, dimension reduction is achieved by minimising the difference in distances between objects in the dimensionally reduced space compared to the full-dimensional space. An exemplary set of documents will be submitted to the process and later used for retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here, words that cannot discriminate between documents and do not carry any content like a or and are removed.
2.
In stemming, certain endings are removed or merged in order to map words with identical stems to the same item.

References

Berry MW, Browne M (1999) Understanding search engines: mathematical modeling and text retrieval. Society for industrial and applied mathematics. Philadelphia, PA, USA
MATH Google Scholar
Berry MW, Dumais ST, O’Brien GW (1994) Using linear algebra for intelligent information retrieval. Tech. Rep. UT-CS-94-270, University of Tennessee
Google Scholar
Berry MW, Drmac Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41:335–362
Article MathSciNet MATH Google Scholar
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407
Article Google Scholar
Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Meth Instrum Comput 23(2):229–236
Article Google Scholar
Dumais ST (2007) LSA and Information retrieval: Getting back to basics, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 16, pp 293–321
Google Scholar
Dumais ST, Furnas GW, Landauer TK, Deerwester SC, Harshman RA (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press, New York, NY, pp 281–285
Google Scholar
Gifi A (1992) Nonlinear multivariate analysis. Comput Stat Data Anal 14(4):548–544, URL http://econpapers.repec.org/RePEc:eee:csdana:v:14:y:1992:i:4:p:548–544
Kolda TG, O’Leary DP (1998) A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans Inf Syst 16(4):322–346
Article MathSciNet Google Scholar
Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284
Article Google Scholar
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, MA
MATH Google Scholar
Martin DI, Berry MW (2007) Mathematical foundations behind latent semantic analysis, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 2, pp 35–55
Google Scholar
Michailidis G, Leeuw JD (2005) Homogeneity analysis using absolute deviations. Comput Stat Data Anal 48(3):587–603
Article MATH Google Scholar
Salton G (1988) Automatic text processing: The transformation analysis and retrieval of information by computer. Addison-Wesley
Google Scholar

Download references

Author information

Authors and Affiliations

European Research Center for Information Systems (ERCIS), University of Münster, Münster, Germany
Kay F. Hildebrand & Ulrich Müller-Funk

Authors

Kay F. Hildebrand
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Müller-Funk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kay F. Hildebrand .

Editor information

Editors and Affiliations

Fak. Wirtschaftswissenschaften, Inst. Entscheidungstheorieund, Universität Karlsruhe (TH), Kaiserstr. 12, Karlsruhe, 76128, Germany
Wolfgang A. Gaul
Insitute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstr. 12, Karlsruhe, 76131, Baden-Württemberg, Germany
Andreas Geyer-Schulz
, Information Systems, University ofHildesheim, Marienburger Platz 22, Hildesheim, 31141, Germany
Lars Schmidt-Thieme
Institute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstraße 12, Karlsruhe, 76128, Germany
Jonas Kunze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hildebrand, K.F., Müller-Funk, U. (2012). HOMALS for Dimension Reduction in Information Retrieval. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-24466-7_36
Published: 05 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24465-0
Online ISBN: 978-3-642-24466-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics