An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library

  • Claudio Gennaro
  • Giuseppe Amato
  • Paolo Bolettieri
  • Pasquale Savino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6273)


Content-based image retrieval is becoming a popular way for searching digital libraries as the amount of available multimedia data increases. However, the cost of developing from scratch a robust and reliable system with content-based image retrieval facilities for large databases is quite prohibitive.

In this paper, we propose to exploit an approach to perform approximate similarity search in metric spaces developed by [3,6]. The idea at the basis of these techniques is that when two objects are very close one to each other they ’see’ the world around them in the same way. Accordingly, we can use a measure of dissimilarity between the views of the world at different objects, in place of the distance function of the underlying metric space. To employ this idea the low level image features (such as colors and textures) are converted into a textual form and are indexed into the inverted index by means of the Lucene search engine library. The conversion of the features in textual form allows us to employ the Lucene’s off-the-shelf indexing and searching abilities with a little implementation effort. In this way, we are able to set up a robust information retrieval system that combines full-text search with content-based image retrieval capabilities.


Approximate Similarity Search Access Methods Lucene 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst. 21(2), 192–227 (2003)CrossRefGoogle Scholar
  3. 3.
    Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd International Conference on Scalable Information Systems (InfoScale 2008), pp. 1–10. ICST (2008)Google Scholar
  4. 4.
    Batko, M., Kohoutkova, P., Novak, D.: Cophir image collection under the microscope. In: International Workshop on Similarity Search and Applications, pp. 47–54 (2009)Google Scholar
  5. 5.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Rabitti, F.: Enabling content-based image retrieval in very large digital libraries. In: Second Workshop on Very Large Digital Libraries (VLDL 2009), DELOS, pp. 43–50 (2009)Google Scholar
  6. 6.
    Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1647–1658 (2007)Google Scholar
  7. 7.
    Ciaccia, P., Patella, M.: Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: ICDE, pp. 244–255 (2000)Google Scholar
  8. 8.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 426–435. Morgan Kaufmann, San Francisco (1997)Google Scholar
  9. 9.
    Egecioglu, Ö., Ferhatosmanoglu, H.: Dimensionality reduction and similarity computation by inner product approximations. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM 2000), McLean, Virginia, USA, November 6-11, pp. 219–226. ACM Press, New York (2000)Google Scholar
  10. 10.
    Esuli, A.: Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 2009), pp. 17–24 (2009)Google Scholar
  11. 11.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top-k lists. SIAM J. of Discrete Math. 17(1), 134–160 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Faloutsos, C., Lin, K.-I.: FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Carey, M.J., Schneider, D.A. (eds.) Proceedings of the 18th ACM International Conference on Management of Data (SIGMOD 1995), San Jose, California, USA, May 22-25, pp. 163–174. ACM Press, New York (1995)CrossRefGoogle Scholar
  13. 13.
    Lux, M., Chatzichristofis, S.A.: Lire: lucene image retrieval: an extensible java cbir library. In: MM 2008: Proceeding of the 16th ACM International Conference on Multimedia, pp. 1085–1088. ACM, New York (2008)CrossRefGoogle Scholar
  14. 14.
    Ogras, Ü.Y., Ferhatosmanoglu, H.: Dimensionality reduction using magnitude and shape approximations. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM 2003), New Orleans, Louisiana, USA, November 3-8, pp. 99–107. ACM Press, New York (2003)Google Scholar
  15. 15.
    Pramanik, S., Alexander, S., Li, J.: An efficient searching algorithm for approximate nearest neighbor queries in high dimensions. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS 1999), Florence, Italy, June 7-11, vol. 1. IEEE Computer Society Press, Los Alamitos (1999)CrossRefGoogle Scholar
  16. 16.
    Pramanik, S., Li, J., Ruan, J., Bhattacharjee, S.K.: Efficient search scheme for very large image databases. In: Beretta, G.B., Schettini, R. (eds.) Proceedings of the International Society for Optical Engineering (SPIE) on Internet Imaging, San Jose, California, USA, January 26, vol. 3964, pp. 79–90. The International Society for Optical Engineering (December 1999)Google Scholar
  17. 17.
    Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters 21(13-14), 1193–1198 (2000); Selected Papers from The 11th Scandinavian Conference on ImagezbMATHCrossRefGoogle Scholar
  18. 18.
    Wang, X., Wang, J.T.-L., Lin, K.-I., Shasha, D., Shapiro, B.A., Zhang, K.: An index structure for data mining and clustering. In: Knowledge and Information Systems, vol. 2, pp. 161–184. Springer, Heidelberg (2000)Google Scholar
  19. 19.
    Weber, R., Böhm, K.: Trading quality for time with nearest neighbor search. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, p. 21. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  20. 20.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach. In: Advances in Database Systems, vol. 32. Springer, Heidelberg (2006)Google Scholar
  21. 21.
    Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with m-trees. VLDB J 7(4), 275–293 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Claudio Gennaro
    • 1
  • Giuseppe Amato
    • 1
  • Paolo Bolettieri
    • 1
  • Pasquale Savino
    • 1
  1. 1.ISTI - CNRPisaItaly

Personalised recommendations