Regrouping Metric-Space Search Index for Search Engine Size Adaptation

  • Khalil Al RuqeishiEmail author
  • Michal Konečný
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9371)


This work contributes to the development of search engines that self-adapt their size in response to fluctuations in workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computational resources to or from the engine. In this paper, we focus on the problem of regrouping the metric-space search index when the number of virtual machines used to run the search engine is modified to reflect changes in workload. We propose an algorithm for incrementally adjusting the index to fit the varying number of virtual machines. We tested its performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud, while calibrating the results to compensate for the performance fluctuations of the platform. Our experiments show that, when compared with computing the index from scratch, the incremental algorithm speeds up the index computation 2–10 times while maintaining a similar search performance.


Search Engine Virtual Machine Query Processing Range Query Search Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Catalyurek, U.V., Boman, E.G., Devine, K.D., Bozdağ, D., Heaphy, R.T., Riesen, L.A.: A repartitioning hypergraph model for dynamic load balancing. Journal of Parallel and Distributed Computing 69(8), 711–724 (2009)CrossRefGoogle Scholar
  2. 2.
    Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)CrossRefGoogle Scholar
  3. 3.
    Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Peer-to-peer similarity search in metric spaces. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 986–997. VLDB Endowment (2007)Google Scholar
  4. 4.
    Gil-Costa, V., Marin, M.: Approximate distributed metric-space search. In: Proceedings of the 9th Workshop On Large-Scale And Distributed Informational Retrieval, pp. 15–20. ACM (2011)Google Scholar
  5. 5.
    Gil-Costa, V., Marin, M.: Load balancing query processing in metric-space similarity search. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 368–375. IEEE (2012)Google Scholar
  6. 6.
    Gil-Costa, V., Marin, M., Reyes, N.: Parallel query processing on distributed clustering indexes. Journal of Discrete Algorithms 7(1), 3–17 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Marin, M., Ferrarotti, F., Gil-Costa, V.: Distributing a metric-space search index onto processors. In: 2010 39th International Conference on Parallel Processing (ICPP), pp. 433–442. IEEE (2010)Google Scholar
  8. 8.
    Marin, M., Gil-Costa, V., Bonacic, C.: A search engine index for multimedia content. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 866–875. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  9. 9.
    Marin, M., Gil-Costa, V., Hernandez, C.: Dynamic P2P indexing and search based on compact clustering. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 124–131. IEEE (2009)Google Scholar
  10. 10.
    Novak, D., Batko, M., Zezula, P.: Metric index: An efficient and scalable solution for precise and approximate similarity search. Information Systems 36(4), 721–733 (2011)CrossRefGoogle Scholar
  11. 11.
    Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. Information Processing & Management 48(5), 855–872 (2012)CrossRefGoogle Scholar
  12. 12.
    Papadopoulos, A.N., Manolopoulos, Y.: Distributed processing of similarity queries. Distributed and Parallel Databases 9(1), 67–92 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Puppin, D.: A search engine architecture based on collection selection. Ph.D. thesis, PhD thesis, Dipartimento di Informatica, Universita di Pisa, Pisa, Italy (2007)Google Scholar
  14. 14.
    Puppin, D., Silvestri, F., Laforenza, D.: Query-driven document partitioning and collection selection. In: InfoScale 2006: Proceedings of the 1st International Conference on Scalable Information Systems. ACM Press, New York (2006)Google Scholar
  15. 15.
    Yuan, Y., Wang, G., Sun, Y.: Efficient peer-to-peer similarity query processing for high-dimensional data. In: 2010 12th International Asia-Pacific Web Conference (APWEB), pp. 195–201. IEEE (2010)Google Scholar
  16. 16.
    van Zwol, R., Rüger, S., Sanderson, M., Mass, Y.: Multimedia information retrieval: new challenges in audio visual search. In: ACM SIGIR Forum, vol. 41, pp. 77–82. ACM (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Engineering and Applied ScienceAston UniversityBirminghamUK

Personalised recommendations