Abstract
Information service providers and companies have typically been using expensive mid-range or mainframe computers when they need a high performance information retrieval system for massive data sources such as the Internet. In recent years, companies have begun considering the PC cluster system as an alternative solution because of its cost-effectiveness as well as its high scalability. However, if some of the cluster nodes break down, users may have to wait for a long time or even may not be able to get any result in the worst case. This paper presents a duplicated data declustering method for PC cluster-based parallel information retrieval in order to achieve fault tolerance and to improve load balance in an efficient manner at low cost. The effectiveness of our method has been confirmed by experiments with a corpus of two million newspaper articles on an 8-node PC cluster.
This work was funded by the University Research Program supported by Ministry of Information and Communication in Korea under contract 2002-005-3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Reading (1999)
Jeong, B., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)
Chung, S.-H., Kwon, H.-C., Ryu, K.R., Jang, H.-K., Kim, J.-H., Choi, C.-A.: Information retrieval on an SCI-based PC cluster. Journal of Supercomputing 19(3), 251–265 (2001)
National Energy Research Scientic Computing Center: MVICH - MPI for virtual interface architecture (1999), http://www.nersc.gov/research/ftg/mvich/index.html
Stanfill, C., Thau, R.: Information retrieval on the connection machine: 1 to 8192 gigabytes. Information Processing and Management 27, 285–310 (1991)
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partitioned inverted indices for largescale digital libraries. In: Proceeding of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 422–431 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, J., Ahn, H., Jung, SW., Ryu, K.R., Kwon, HC., Chung, SH. (2004). Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2003. Lecture Notes in Computer Science, vol 3019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24669-5_89
Download citation
DOI: https://doi.org/10.1007/978-3-540-24669-5_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21946-0
Online ISBN: 978-3-540-24669-5
eBook Packages: Springer Book Archive