Abstract
This work contributes to the development of search engines that self-adapt their size in response to fluctuations in workload. Deploying a search engine in an Infrastructure as a Service (IaaS) cloud facilitates allocating or deallocating computational resources to or from the engine. In this paper, we focus on the problem of regrouping the metric-space search index when the number of virtual machines used to run the search engine is modified to reflect changes in workload. We propose an algorithm for incrementally adjusting the index to fit the varying number of virtual machines. We tested its performance using a custom-build prototype search engine deployed in the Amazon EC2 cloud, while calibrating the results to compensate for the performance fluctuations of the platform. Our experiments show that, when compared with computing the index from scratch, the incremental algorithm speeds up the index computation 2–10 times while maintaining a similar search performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Catalyurek, U.V., Boman, E.G., Devine, K.D., Bozdağ, D., Heaphy, R.T., Riesen, L.A.: A repartitioning hypergraph model for dynamic load balancing. Journal of Parallel and Distributed Computing 69(8), 711–724 (2009)
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)
Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Peer-to-peer similarity search in metric spaces. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 986–997. VLDB Endowment (2007)
Gil-Costa, V., Marin, M.: Approximate distributed metric-space search. In: Proceedings of the 9th Workshop On Large-Scale And Distributed Informational Retrieval, pp. 15–20. ACM (2011)
Gil-Costa, V., Marin, M.: Load balancing query processing in metric-space similarity search. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 368–375. IEEE (2012)
Gil-Costa, V., Marin, M., Reyes, N.: Parallel query processing on distributed clustering indexes. Journal of Discrete Algorithms 7(1), 3–17 (2009)
Marin, M., Ferrarotti, F., Gil-Costa, V.: Distributing a metric-space search index onto processors. In: 2010 39th International Conference on Parallel Processing (ICPP), pp. 433–442. IEEE (2010)
Marin, M., Gil-Costa, V., Bonacic, C.: A search engine index for multimedia content. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 866–875. Springer, Heidelberg (2008)
Marin, M., Gil-Costa, V., Hernandez, C.: Dynamic P2P indexing and search based on compact clustering. In: Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 124–131. IEEE (2009)
Novak, D., Batko, M., Zezula, P.: Metric index: An efficient and scalable solution for precise and approximate similarity search. Information Systems 36(4), 721–733 (2011)
Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. Information Processing & Management 48(5), 855–872 (2012)
Papadopoulos, A.N., Manolopoulos, Y.: Distributed processing of similarity queries. Distributed and Parallel Databases 9(1), 67–92 (2001)
Puppin, D.: A search engine architecture based on collection selection. Ph.D. thesis, PhD thesis, Dipartimento di Informatica, Universita di Pisa, Pisa, Italy (2007)
Puppin, D., Silvestri, F., Laforenza, D.: Query-driven document partitioning and collection selection. In: InfoScale 2006: Proceedings of the 1st International Conference on Scalable Information Systems. ACM Press, New York (2006)
Yuan, Y., Wang, G., Sun, Y.: Efficient peer-to-peer similarity query processing for high-dimensional data. In: 2010 12th International Asia-Pacific Web Conference (APWEB), pp. 195–201. IEEE (2010)
van Zwol, R., Rüger, S., Sanderson, M., Mass, Y.: Multimedia information retrieval: new challenges in audio visual search. In: ACM SIGIR Forum, vol. 41, pp. 77–82. ACM (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Al Ruqeishi, K., Konečný, M. (2015). Regrouping Metric-Space Search Index for Search Engine Size Adaptation. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-25087-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25086-1
Online ISBN: 978-3-319-25087-8
eBook Packages: Computer ScienceComputer Science (R0)