Abstract
Due to their time complexity, conventional clustering methods often cannot cope with large data sets like bibliographic data in a scientific library. We will present a method for clustering library documents according to usage histories that is based on the exploration of object sets using restricted random walks.
We will show that, given the particularities of the data, the time complexity of the algorithm is linear. For our application, the algorithm has proven to work well with more than one million objects, from the point of view of efficiency as well as with respect to cluster quality.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BOCK, H.H. (1974): Automatische Klassifikation. Vandenhoeck & Ruprecht, Göttingen.
DURAN, B.S. and ODELL, P.L. (1974): Cluster Analysis-A Survey. Springer, Berlin, Heidelberg, New York.
EHRENBERG, A.S.C. (1988): Repeat-Buying: Facts, Theory and Applications. Charles Griffin & Company Ltd, London.
ERDÖS, P. and RENYI, A. (1957): On Random Graphs I. Publ. Math., 6, 290–297.
FRANKE, M. (2003): Clustering of Very Large Document Sets Using Random Walks. Master's Thesis, Universität Karlsruhe (TH).
GEYER-SCHULZ, A., NEUMANN, A. and THEDE, A. (2003): Others also Use: A Robust Recommender System for Scientific Libraries. In: T. Koch and I.T. Solvberg (Eds.): Research and Advanced Technology for Digital Libraries: 7th European Conference, ECDL 2003. Springer, Berlin, 113–125.
SCHÖLL, J. and PASCHINGER, P. (2002): Cluster Analysis with Restricted Random Walks. In: K. Jajuga, A. Sokołowski and H.H. Bock (Eds.): Classification, Clustering, and Data Analysis. Springer, Berlin, 113–120.
VIEGENER, J. (1997): Inkrementelle, domänenunabhängige Thesauruserstellung in dokumentbasierten Informationssystemen durch Kombination von Konstruktionsverfahren. infix, Sankt Augustin.
WALD, A. (1966): Sequential Analysis. John Wiley & Sons, New York.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Franke, M., Thede, A. (2005). Clustering of Large Document Sets with Restricted Random Walks on Usage Histories. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_46
Download citation
DOI: https://doi.org/10.1007/3-540-28084-7_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)