Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval

Kang, Jaeho; Ahn, Hyunju; Jung, Sung-Won; Ryu, Kwang Ryel; Kwon, Hyuk-Chul; Chung, Sang-Hwa

doi:10.1007/978-3-540-24669-5_89

Jaeho Kang¹⁶,
Hyunju Ahn¹⁷,
Sung-Won Jung¹⁷,
Kwang Ryel Ryu¹⁷,
Hyuk-Chul Kwon¹⁷ &
…
Sang-Hwa Chung¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3019))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

429 Accesses
1 Citations

Abstract

Information service providers and companies have typically been using expensive mid-range or mainframe computers when they need a high performance information retrieval system for massive data sources such as the Internet. In recent years, companies have begun considering the PC cluster system as an alternative solution because of its cost-effectiveness as well as its high scalability. However, if some of the cluster nodes break down, users may have to wait for a long time or even may not be able to get any result in the worst case. This paper presents a duplicated data declustering method for PC cluster-based parallel information retrieval in order to achieve fault tolerance and to improve load balance in an efficient manner at low cost. The effectiveness of our method has been confirmed by experiments with a corpus of two million newspaper articles on an 8-node PC cluster.

This work was funded by the University Research Program supported by Ministry of Information and Communication in Korea under contract 2002-005-3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Pruning techniques for parallel processing of reverse top-k queries

Article 25 May 2020

Effective Information Retrieval Algorithm for Linear Multiprocessor Architecture

Prevailing Approaches and PCURE for Data Retrieval from Large Databases

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Jeong, B., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)
Article Google Scholar
Chung, S.-H., Kwon, H.-C., Ryu, K.R., Jang, H.-K., Kim, J.-H., Choi, C.-A.: Information retrieval on an SCI-based PC cluster. Journal of Supercomputing 19(3), 251–265 (2001)
Article MATH Google Scholar
National Energy Research Scientic Computing Center: MVICH - MPI for virtual interface architecture (1999), http://www.nersc.gov/research/ftg/mvich/index.html
Stanfill, C., Thau, R.: Information retrieval on the connection machine: 1 to 8192 gigabytes. Information Processing and Management 27, 285–310 (1991)
Article Google Scholar
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partitioned inverted indices for largescale digital libraries. In: Proceeding of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 422–431 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Intelligent and Integrated Port Management Systems, Dong-A University, 840, Hadan-Dong, Saha-Ku, Busan, Korea
Jaeho Kang
Division of Electrical and Computer Engineering, Pusan National University, San 30, Jangjeon-Dong, Kumjeong-Ku, Busan, Korea
Hyunju Ahn, Sung-Won Jung, Kwang Ryel Ryu, Hyuk-Chul Kwon & Sang-Hwa Chung

Authors

Jaeho Kang
View author publications
You can also search for this author in PubMed Google Scholar
Hyunju Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Won Jung
View author publications
You can also search for this author in PubMed Google Scholar
Kwang Ryel Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-Chul Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Hwa Chung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computational and Information Sciences, Czestochowa University of Technology,
Roman Wyrzykowski
Computer Science Department, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra
Systems Research Institute, Polish Academy of Science, Warsaw, Poland
Marcin Paprzycki
Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, J., Ahn, H., Jung, SW., Ryu, K.R., Kwon, HC., Chung, SH. (2004). Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2003. Lecture Notes in Computer Science, vol 3019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24669-5_89

Download citation

DOI: https://doi.org/10.1007/978-3-540-24669-5_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21946-0
Online ISBN: 978-3-540-24669-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval

Abstract

Access this chapter

Preview

Similar content being viewed by others

Pruning techniques for parallel processing of reverse top-k queries

Effective Information Retrieval Algorithm for Linear Multiprocessor Architecture

Prevailing Approaches and PCURE for Data Retrieval from Large Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Load Balance and Fault Tolerance for PC Cluster-Based Parallel Information Retrieval

Abstract

Access this chapter

Preview

Similar content being viewed by others

Pruning techniques for parallel processing of reverse top-k queries

Effective Information Retrieval Algorithm for Linear Multiprocessor Architecture

Prevailing Approaches and PCURE for Data Retrieval from Large Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation