An Investigation into Improving the Load Balance for Term-Based Partitioning
In Parallel (IR) systems the query response time is limited by the time of the slowest node in the system, thus distributing the load equally across the nodes is very important issue. In this paper, we propose improving the load balance for term-based partitioning by classifying the terms based on their length then distribute them equally across nodes. The motivation for term length partitioning comes from the observation that the Excite-97 queries have a very skewed distribution of term lengths with some predominant lengths. We also propose the term-frequency partitioning scheme in which the terms are classified based on the total term frequency (F) and then distribute equally across the nodes.
KeywordsTerm-partitioning schemes Term-frequency partitioning Term-length partitioning Node utilization
Unable to display preview. Download preview PDF.
- 1.Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Addison-Wesley, New York (1999)Google Scholar
- 2.Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys (CSUR) (2006)Google Scholar
- 3.Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted Files Versus Signature Files for Text Indexing. ACM Transactions on Database systems, 453–490 (1998)Google Scholar
- 5.Xi, W., Somil, O., Luo, M., Fox, E.: Hybrid partition inverted files for large-scale digital libraries. In: Proc. Digital Library: IT Opportunities and Challenges in the New Millennium, Beijing Library Press, Beijing, China (2002)Google Scholar
- 6.Cambazoglu, B., Catal, A., Aykanat, C.: Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 717–725. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 9.Jaruskulchai, C., Kruengkrai, C.: Building Inverted Files Through Efficient Dynamic Hashing (2002)Google Scholar
- 10.Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed Query Processing Using Partitioned Inverted Files, pp. 10–20 (2001)Google Scholar