Abstract
The notion of quality in its broadest sense is central to information retrieval (IR) where a user’s information need is to be fulfilled as good as possible. A user searching for cars on sale in Bamberg might be interested in car dealers geographically close to Bamberg with high user ratings. The buyer might already know or trust a person who trusts the particular dealer. Furthermore, the cars which are sold by the dealer should offer a high quality on different levels–the type of car in general as well as the car to be bought. If the buyer can only travel to Bamberg on weekends, availability of the car dealer becomes another important factor. As this example shows, the integration of various quality aspects in IR is challenging but essential.
Thus, there is a need for scalable and efficient indexing and retrieval techniques which can cope with such search situations. Here, metric space access methods (MAMs) present a flexible indexing paradigm.We will briefly review these techniques and show how they can be applied in the context of qualityaware IR. Furthermore, we will present IF4MI which is purely based on the inverted file concept and thus inherently provides a multi-feature MAM. It can make use of extensive knowledge in the field of inverted file-based indexing and represents a versatile indexing technique for quality-aware IR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bayer, R., McCreight, E.: Organization and maintenance of large ordered indexes. Acta Informatica 1(3), 173–189 (1972)
Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)
Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a Test Collection for Content-Based Image Retrieval. CoRR, abs/0905.4627v2 (2009), http://arxiv.org/abs/0905.4627v2 (last visit: September 12, 2011)
Brin, S.: Near Neighbor Search in Large Metric Spaces. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, pp. 574–584. Morgan Kaufmann, Zurich (1995)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th Intl. World Wide Web Conf., pp. 107–117. Elsevier Science Publishers, Amsterdam (1998)
Brisaboa, N., Pedreira, O., Seco, D., Solar, R., Uribe, R.: Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers. In: Geffert, V., Karhumäki, J., Bertoni, A., Preneel, B., Návrat, P., Bieliková, M. (eds.) SOFSEM 2008. LNCS, vol. 4910, pp. 186–197. Springer, Heidelberg (2008)
Bustos, B., Keim, D., Saupe, D., Schreck, T.: Content-based 3d object retrieval. IEEE Comput. Graph. Appl. 27(4), 22–27 (2007)
Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24, 2357–2366 (2003)
Chandrasekaran, K., Gauch, S., Lakkaraju, P., Luong, H.P.: Concept-based document recommendations for citeseer authors. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 83–92. Springer, Heidelberg (2008)
Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in Metric Spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proc. of the 23rd Intl. Conf. on Very Large Data Bases, pp. 426–435. Morgan Kaufmann, Athens (1997)
Connor, R., Simeoni, F., Iakovos, M., Moss, R.: A bounded distance metric for comparing tree structure. Inf. Syst. 36(4), 748–764 (2011)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines – Information Retrieval in Practice. Pearson, Upper Saddle River (2010)
Datta, R., Li, J., Wang, J.Z.: Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In: Proc. of the Intl. Conf. on Image Processing, pp. 105–108. IEEE, San Diego (2008)
Eisenhardt, M., Müller, W., Henrich, A., Blank, D., El Allali, S.: Clustering-based source selection for efficient image retrieval in peer-to-peer networks. In: Proc. of the 8th Intl. Symp. on Multimedia, pp. 823–830. IEEE, San Diego (2006)
Esuli, A.: MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System. In: Proc. of the 2nd Intl. Workshop on Similarity Search and Applications, pp. 146–148. IEEE, Washington, DC (2009)
Figueroa, K., Chavez, E., Navarro, G., Paredes, R.: Speeding up spatial approximation search in metric spaces. J. Exp. Algorithmics 14, 6:3.6–6:3.21 (2010)
Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010)
Hetland, M.L.: The Basic Principles of Metric Indexing. In: Coello, C.A.C., Dehuri, S., Ghosh, S. (eds.) Swarm Intelligence for Multi-objective Problems in Data Mining. SCI, vol. 242, pp. 199–232. Springer, Heidelberg (2009)
Hu, X., Chiueh, T.C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proc. of the 16th ACM Conf. on Computer and Communications Security, pp. 611–620. ACM, New York (2009)
Jing, Y., Baluja, S.: Pagerank for product image search. In: Proc. of the 17th Intl. Conf. on World Wide Web, pp. 307–316. ACM, New York (2008)
Kunze, M., Weske, M.: Metric trees for efficient similarity search in large process model repositories. In: Muehlen, M.z., Su, J. (eds.) BPM 2010 Workshops. LNBIP, vol. 66, pp. 535–546. Springer, Heidelberg (2011)
Lalmas, M.: XML retrieval. Synthesis Lectures on Information Concepts, Retrieval and Services. Morgan & Claypool Publishers (2009), http://www.morganclaypool.com/doi/abs/10.2200/S00203ED1V01Y200907ICR007
Lee, J.: A graph-based approach for modeling and indexing video data. In: Proc. of the 8th IEEE Intl. Symp. on Multimedia, Washington, DC, USA, pp. 348–355 (2006)
Mamou, J., Mass, Y., Shmueli-Scheuer, M., Sznajder, B.: A Unified Inverted Index for an Efficient Image and Text Retrieval. In: Proc. of the 32nd Intl. Conf. on Research and Development in Information Retrieval, pp. 814–815. ACM, New York
Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7: Multimedia Content Description Interface. Wiley & Sons (2002)
Micó, M.L., Oncina, J., Vidal, E.: A new version of the Nearest-Neighbour Approximating and Eliminating Search Algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)
Müller, H., Squire, D.M., Müller, W., Pun, T.: Efficient access methods for content-based image retrieval with inverted files. In: Multimedia Storage and Archiving Systems IV, pp. 461–472 (1999)
Myrvold, W., Ruskey, F.: Ranking and unranking permutations in linear time. Inf. Process. Lett. 79(6), 281–284 (2001)
Novak, D., Batko, M.: Metric Index: An Efficient and Scalable Solution for Similarity Search. In: Proc. of the 2nd Intl. Workshop on Similarity Search and Applications, pp. 65–73. IEEE, Washington, DC (2009)
Novak, D., Batko, M., Zezula, P.: Metric index: An efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36, 721–733 (2011)
Lestari Paramita, M., Sanderson, M., Clough, P.: Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009. In: Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.) CLEF 2009. LNCS, vol. 6242, pp. 45–59. Springer, Heidelberg (2010)
Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer, Heidelberg (2011)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2006)
Schindler, U., Diepenbroek, M.: Generic xml-based framework for metadata portals. Comput. Geosci. 34(12), 1947–1955 (2008)
Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proc. of the 3rd Intl. Conf. on Similarity Search and Applications, pp. 13–21. ACM, New York (2010)
Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Computing Surveys 43(4), 34:1–34:50 (2011)
Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree building principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)
Skopal, T., Pokorný, J., Snášel, V.: Nearest Neighbours Search Using the PM-Tree. In: Zhou, L.-Z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005)
Socorro, R., Micó, L., Oncina, J.: A fast pivot-based indexing algorithm for metric spaces. Pattern Recogn. Lett. 32(11), 1511–1516 (2011)
Sznajder, B., Mamou, J., Mass, Y., Shmueli-Scheuer, M.: Metric inverted-an efficient inverted indexing method for metric spaces. In: Proc. of the Efficiency Issues in Information Retrieval Workshop (2008), http://irlab.dc.fi.udc.es/ecir/sznajder.pdf (last visit: March 7, 2011)
Tellez, E.S., Chávez, E.: On Locality Sensitive Hashing in Metric Spaces. In: Proc. of the 3rd Intl. Conf. on Similarity Search and Applications, pp. 67–74. ACM, New York (2010)
Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: High performance metric trees minimizing overlap between nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000)
Vidal, E.: New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA). Pattern Recogn. Lett. 15, 1–7 (1994)
Vidal, R.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recogn. Lett. 4, 145–157 (1986)
Wojna, A.: Center-Based Indexing in Vector and Metric Spaces. Fundam. Inf. 56, 285–310 (2002)
Yao, A.C.C.: On random 2-3 trees. Acta Inf. 9, 159–170 (1978)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer New York, Inc., Secaucus (2005)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) article 6 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer- Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Blank, D., Henrich, A. (2013). Inverted File-Based General Metric Space Indexing for Quality-Aware Similarity Search in Information Retrieval. In: Pasi, G., Bordogna, G., Jain, L. (eds) Quality Issues in the Management of Web Information. Intelligent Systems Reference Library, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37688-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-37688-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37687-0
Online ISBN: 978-3-642-37688-7
eBook Packages: EngineeringEngineering (R0)