Skip to main content

Inverted File-Based General Metric Space Indexing for Quality-Aware Similarity Search in Information Retrieval

  • Chapter
Quality Issues in the Management of Web Information

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 50))

  • 849 Accesses

Abstract

The notion of quality in its broadest sense is central to information retrieval (IR) where a user’s information need is to be fulfilled as good as possible. A user searching for cars on sale in Bamberg might be interested in car dealers geographically close to Bamberg with high user ratings. The buyer might already know or trust a person who trusts the particular dealer. Furthermore, the cars which are sold by the dealer should offer a high quality on different levels–the type of car in general as well as the car to be bought. If the buyer can only travel to Bamberg on weekends, availability of the car dealer becomes another important factor. As this example shows, the integration of various quality aspects in IR is challenging but essential.

Thus, there is a need for scalable and efficient indexing and retrieval techniques which can cope with such search situations. Here, metric space access methods (MAMs) present a flexible indexing paradigm.We will briefly review these techniques and show how they can be applied in the context of qualityaware IR. Furthermore, we will present IF4MI which is purely based on the inverted file concept and thus inherently provides a multi-feature MAM. It can make use of extensive knowledge in the field of inverted file-based indexing and represents a versatile indexing technique for quality-aware IR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayer, R., McCreight, E.: Organization and maintenance of large ordered indexes. Acta Informatica 1(3), 173–189 (1972)

    Article  Google Scholar 

  2. Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a Test Collection for Content-Based Image Retrieval. CoRR, abs/0905.4627v2 (2009), http://arxiv.org/abs/0905.4627v2 (last visit: September 12, 2011)

  5. Brin, S.: Near Neighbor Search in Large Metric Spaces. In: Proc. of the 21st Intl. Conf. on Very Large Data Bases, pp. 574–584. Morgan Kaufmann, Zurich (1995)

    Google Scholar 

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th Intl. World Wide Web Conf., pp. 107–117. Elsevier Science Publishers, Amsterdam (1998)

    Google Scholar 

  7. Brisaboa, N., Pedreira, O., Seco, D., Solar, R., Uribe, R.: Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers. In: Geffert, V., Karhumäki, J., Bertoni, A., Preneel, B., Návrat, P., Bieliková, M. (eds.) SOFSEM 2008. LNCS, vol. 4910, pp. 186–197. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Bustos, B., Keim, D., Saupe, D., Schreck, T.: Content-based 3d object retrieval. IEEE Comput. Graph. Appl. 27(4), 22–27 (2007)

    Article  Google Scholar 

  9. Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24, 2357–2366 (2003)

    Article  MATH  Google Scholar 

  10. Chandrasekaran, K., Gauch, S., Lakkaraju, P., Luong, H.P.: Concept-based document recommendations for citeseer authors. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 83–92. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in Metric Spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  12. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proc. of the 23rd Intl. Conf. on Very Large Data Bases, pp. 426–435. Morgan Kaufmann, Athens (1997)

    Google Scholar 

  13. Connor, R., Simeoni, F., Iakovos, M., Moss, R.: A bounded distance metric for comparing tree structure. Inf. Syst. 36(4), 748–764 (2011)

    Article  Google Scholar 

  14. Croft, W.B., Metzler, D., Strohman, T.: Search Engines – Information Retrieval in Practice. Pearson, Upper Saddle River (2010)

    Google Scholar 

  15. Datta, R., Li, J., Wang, J.Z.: Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In: Proc. of the Intl. Conf. on Image Processing, pp. 105–108. IEEE, San Diego (2008)

    Google Scholar 

  16. Eisenhardt, M., Müller, W., Henrich, A., Blank, D., El Allali, S.: Clustering-based source selection for efficient image retrieval in peer-to-peer networks. In: Proc. of the 8th Intl. Symp. on Multimedia, pp. 823–830. IEEE, San Diego (2006)

    Google Scholar 

  17. Esuli, A.: MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System. In: Proc. of the 2nd Intl. Workshop on Similarity Search and Applications, pp. 146–148. IEEE, Washington, DC (2009)

    Chapter  Google Scholar 

  18. Figueroa, K., Chavez, E., Navarro, G., Paredes, R.: Speeding up spatial approximation search in metric spaces. J. Exp. Algorithmics 14, 6:3.6–6:3.21 (2010)

    Google Scholar 

  19. Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Hetland, M.L.: The Basic Principles of Metric Indexing. In: Coello, C.A.C., Dehuri, S., Ghosh, S. (eds.) Swarm Intelligence for Multi-objective Problems in Data Mining. SCI, vol. 242, pp. 199–232. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Hu, X., Chiueh, T.C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proc. of the 16th ACM Conf. on Computer and Communications Security, pp. 611–620. ACM, New York (2009)

    Google Scholar 

  22. Jing, Y., Baluja, S.: Pagerank for product image search. In: Proc. of the 17th Intl. Conf. on World Wide Web, pp. 307–316. ACM, New York (2008)

    Chapter  Google Scholar 

  23. Kunze, M., Weske, M.: Metric trees for efficient similarity search in large process model repositories. In: Muehlen, M.z., Su, J. (eds.) BPM 2010 Workshops. LNBIP, vol. 66, pp. 535–546. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Lalmas, M.: XML retrieval. Synthesis Lectures on Information Concepts, Retrieval and Services. Morgan & Claypool Publishers (2009), http://www.morganclaypool.com/doi/abs/10.2200/S00203ED1V01Y200907ICR007

  25. Lee, J.: A graph-based approach for modeling and indexing video data. In: Proc. of the 8th IEEE Intl. Symp. on Multimedia, Washington, DC, USA, pp. 348–355 (2006)

    Google Scholar 

  26. Mamou, J., Mass, Y., Shmueli-Scheuer, M., Sznajder, B.: A Unified Inverted Index for an Efficient Image and Text Retrieval. In: Proc. of the 32nd Intl. Conf. on Research and Development in Information Retrieval, pp. 814–815. ACM, New York

    Google Scholar 

  27. Manjunath, B.S., Salembier, P., Sikora, T. (eds.): Introduction to MPEG-7: Multimedia Content Description Interface. Wiley & Sons (2002)

    Google Scholar 

  28. Micó, M.L., Oncina, J., Vidal, E.: A new version of the Nearest-Neighbour Approximating and Eliminating Search Algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)

    Article  Google Scholar 

  29. Müller, H., Squire, D.M., Müller, W., Pun, T.: Efficient access methods for content-based image retrieval with inverted files. In: Multimedia Storage and Archiving Systems IV, pp. 461–472 (1999)

    Google Scholar 

  30. Myrvold, W., Ruskey, F.: Ranking and unranking permutations in linear time. Inf. Process. Lett. 79(6), 281–284 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  31. Novak, D., Batko, M.: Metric Index: An Efficient and Scalable Solution for Similarity Search. In: Proc. of the 2nd Intl. Workshop on Similarity Search and Applications, pp. 65–73. IEEE, Washington, DC (2009)

    Chapter  Google Scholar 

  32. Novak, D., Batko, M., Zezula, P.: Metric index: An efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36, 721–733 (2011)

    Article  Google Scholar 

  33. Lestari Paramita, M., Sanderson, M., Clough, P.: Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009. In: Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.) CLEF 2009. LNCS, vol. 6242, pp. 45–59. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  34. Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer, Heidelberg (2011)

    Google Scholar 

  35. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2006)

    MATH  Google Scholar 

  36. Schindler, U., Diepenbroek, M.: Generic xml-based framework for metadata portals. Comput. Geosci. 34(12), 1947–1955 (2008)

    Article  Google Scholar 

  37. Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proc. of the 3rd Intl. Conf. on Similarity Search and Applications, pp. 13–21. ACM, New York (2010)

    Chapter  Google Scholar 

  38. Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Computing Surveys 43(4), 34:1–34:50 (2011)

    Article  Google Scholar 

  39. Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree building principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  40. Skopal, T., Pokorný, J., Snášel, V.: Nearest Neighbours Search Using the PM-Tree. In: Zhou, L.-Z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  41. Socorro, R., Micó, L., Oncina, J.: A fast pivot-based indexing algorithm for metric spaces. Pattern Recogn. Lett. 32(11), 1511–1516 (2011)

    Article  Google Scholar 

  42. Sznajder, B., Mamou, J., Mass, Y., Shmueli-Scheuer, M.: Metric inverted-an efficient inverted indexing method for metric spaces. In: Proc. of the Efficiency Issues in Information Retrieval Workshop (2008), http://irlab.dc.fi.udc.es/ecir/sznajder.pdf (last visit: March 7, 2011)

  43. Tellez, E.S., Chávez, E.: On Locality Sensitive Hashing in Metric Spaces. In: Proc. of the 3rd Intl. Conf. on Similarity Search and Applications, pp. 67–74. ACM, New York (2010)

    Chapter  Google Scholar 

  44. Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: High performance metric trees minimizing overlap between nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  45. Vidal, E.: New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA). Pattern Recogn. Lett. 15, 1–7 (1994)

    Article  Google Scholar 

  46. Vidal, R.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recogn. Lett. 4, 145–157 (1986)

    Article  Google Scholar 

  47. Wojna, A.: Center-Based Indexing in Vector and Metric Spaces. Fundam. Inf. 56, 285–310 (2002)

    MathSciNet  Google Scholar 

  48. Yao, A.C.C.: On random 2-3 trees. Acta Inf. 9, 159–170 (1978)

    Article  MATH  Google Scholar 

  49. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer New York, Inc., Secaucus (2005)

    Google Scholar 

  50. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) article 6 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Blank .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer- Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Blank, D., Henrich, A. (2013). Inverted File-Based General Metric Space Indexing for Quality-Aware Similarity Search in Information Retrieval. In: Pasi, G., Bordogna, G., Jain, L. (eds) Quality Issues in the Management of Web Information. Intelligent Systems Reference Library, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37688-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37688-7_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37687-0

  • Online ISBN: 978-3-642-37688-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics