Skip to main content

A Novel Architecture to Crawl Images Using OAI-PMH

  • Conference paper
  • First Online:
Sensors and Image Processing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 651))

  • 833 Accesses

Abstract

With this evolving size of WWW, it is not possible for search engines to crawl and index the whole Web. Therefore, intelligent crawlers are needed that crawl only those sections of the WWW which contains preferred information. Image crawling is also such a technique that requires to crawl and index special type of information related to the images from the Web. Unlike traditional image crawlers, in this chapter, a novel method for crawling the images from web pages as well as from pdf documents has been proposed that not only uses metatags related to the image but also uses some important features that are related to the images. This proposed work also reports very promising outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Harmandas, S.M, Dunlop M.D.: Image retrieval by hypertext links proceeding of ACMSIGIR (1997)

    Google Scholar 

  2. Marinheiro, R.N., Hall, W.: Expanding a Hypertext Information Retrieval System to Incorporate Multimedia Information. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences, Vol. II, (1998)

    Google Scholar 

  3. Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)

    Google Scholar 

  4. Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)

    Google Scholar 

  5. Kherfi M.L., Brahmi D., Ziou D., Bernardi A.: Atlas WISE: A Web based Image Retrieval Engine (2003)

    Google Scholar 

  6. Cho, J., Mukherjea, S.: Crawling Images on the Web, In Proceedings of Third International Conference on Visual Information Systems (Visual99), Amsterdam, The Netherlands, (1999)

    Google Scholar 

  7. Kherfi M.L., Bernardi, A.: What is behind image retrieval from the WWW

    Google Scholar 

  8. Kherfi M.L., Brahmi D., Ziou D.: Image Retrieval from the World Wide Web: Issues, Techniques, and Systems. ACM Comput. Surv. 36(1), 35–67 (2004)

    Google Scholar 

  9. Remco, C.: Veltkamp, Mirela Tanase: Content-Based Image Retrieval Systems: A Survey (2000)

    Google Scholar 

  10. http://www.openarchives.org

  11. Sharma, S., Sharma, A.K., Gupta, J.P.: Exploring OAI-PMH: Open Archives Initiative Protocol for Metedata Harvesting, IJARCS (2010)

    Google Scholar 

  12. Nelson M.L., Van de Sompel, H., Liu, X., Harrison, T., McFarland, N.: mod_oai: An Apache Module for Metadata Harvesting, In Proceedings of ECDL (2005)

    Google Scholar 

  13. Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)

    Google Scholar 

  14. Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shruti Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Shruti Sharma, Parul Gupta, Nagpal, C.K. (2018). A Novel Architecture to Crawl Images Using OAI-PMH. In: Urooj, S., Virmani, J. (eds) Sensors and Image Processing. Advances in Intelligent Systems and Computing, vol 651. Springer, Singapore. https://doi.org/10.1007/978-981-10-6614-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6614-6_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6613-9

  • Online ISBN: 978-981-10-6614-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics