An Investigation into Improving the Load Balance for Term-Based Partitioning

  • Ahmad Abusukhon
  • Mohammad Talib
  • Michael P. Oakes
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 5)


In Parallel (IR) systems the query response time is limited by the time of the slowest node in the system, thus distributing the load equally across the nodes is very important issue. In this paper, we propose improving the load balance for term-based partitioning by classifying the terms based on their length then distribute them equally across nodes. The motivation for term length partitioning comes from the observation that the Excite-97 queries have a very skewed distribution of term lengths with some predominant lengths. We also propose the term-frequency partitioning scheme in which the terms are classified based on the total term frequency (F) and then distribute equally across the nodes.


Term-partitioning schemes Term-frequency partitioning Term-length partitioning Node utilization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Addison-Wesley, New York (1999)Google Scholar
  2. 2.
    Zobel, J., Moffat, A.: Inverted Files for Text Search Engines. ACM Computing Surveys (CSUR) (2006)Google Scholar
  3. 3.
    Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted Files Versus Signature Files for Text Indexing. ACM Transactions on Database systems, 453–490 (1998)Google Scholar
  4. 4.
    Moffat, A., Webber, W., Zobel, J.: Load Balancing for Term-Distributed Parallel Retrieval. In: The 29th annual international ACM SIGIR conference on Research and development in information, pp. 348–355. ACM, New York (2006)CrossRefGoogle Scholar
  5. 5.
    Xi, W., Somil, O., Luo, M., Fox, E.: Hybrid partition inverted files for large-scale digital libraries. In: Proc. Digital Library: IT Opportunities and Challenges in the New Millennium, Beijing Library Press, Beijing, China (2002)Google Scholar
  6. 6.
    Cambazoglu, B., Catal, A., Aykanat, C.: Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 717–725. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Jeong, B.S., Omiecinski, E.: Inverted File Partitioning Schemes in Multiple Disk Systems. IEEE, Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)CrossRefGoogle Scholar
  8. 8.
    Heinz, S., Zobel, J.: Efficient Single-Pass Index Construction for Text Databases. Journal of the American Society for Information Science and Technology 54(8), 713–729 (2003)CrossRefGoogle Scholar
  9. 9.
    Jaruskulchai, C., Kruengkrai, C.: Building Inverted Files Through Efficient Dynamic Hashing (2002)Google Scholar
  10. 10.
    Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed Query Processing Using Partitioned Inverted Files, pp. 10–20 (2001)Google Scholar
  11. 11.
    Lester, N., Moffat, A., Zobel, J.: Fast On-Line Index Construction by Geometric Partitioning. In: Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM 2005, Bremen, Germany, October 31–November 5, pp. 776–783. ACM, New York (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ahmad Abusukhon
    • 1
  • Mohammad Talib
    • 2
  • Michael P. Oakes
    • 1
  1. 1.School of Computing and TechnologyUniversity of SunderlandSunderland
  2. 2.Department of Computer ScienceUniversity of BotswanaGaborone

Personalised recommendations