Abstract
With this evolving size of WWW, it is not possible for search engines to crawl and index the whole Web. Therefore, intelligent crawlers are needed that crawl only those sections of the WWW which contains preferred information. Image crawling is also such a technique that requires to crawl and index special type of information related to the images from the Web. Unlike traditional image crawlers, in this chapter, a novel method for crawling the images from web pages as well as from pdf documents has been proposed that not only uses metatags related to the image but also uses some important features that are related to the images. This proposed work also reports very promising outcome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Harmandas, S.M, Dunlop M.D.: Image retrieval by hypertext links proceeding of ACMSIGIR (1997)
Marinheiro, R.N., Hall, W.: Expanding a Hypertext Information Retrieval System to Incorporate Multimedia Information. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences, Vol. II, (1998)
Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)
Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)
Kherfi M.L., Brahmi D., Ziou D., Bernardi A.: Atlas WISE: A Web based Image Retrieval Engine (2003)
Cho, J., Mukherjea, S.: Crawling Images on the Web, In Proceedings of Third International Conference on Visual Information Systems (Visual99), Amsterdam, The Netherlands, (1999)
Kherfi M.L., Bernardi, A.: What is behind image retrieval from the WWW
Kherfi M.L., Brahmi D., Ziou D.: Image Retrieval from the World Wide Web: Issues, Techniques, and Systems. ACM Comput. Surv. 36(1), 35–67 (2004)
Remco, C.: Veltkamp, Mirela Tanase: Content-Based Image Retrieval Systems: A Survey (2000)
Sharma, S., Sharma, A.K., Gupta, J.P.: Exploring OAI-PMH: Open Archives Initiative Protocol for Metedata Harvesting, IJARCS (2010)
Nelson M.L., Van de Sompel, H., Liu, X., Harrison, T., McFarland, N.: mod_oai: An Apache Module for Metadata Harvesting, In Proceedings of ECDL (2005)
Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)
Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shruti Sharma, Parul Gupta, Nagpal, C.K. (2018). A Novel Architecture to Crawl Images Using OAI-PMH. In: Urooj, S., Virmani, J. (eds) Sensors and Image Processing. Advances in Intelligent Systems and Computing, vol 651. Springer, Singapore. https://doi.org/10.1007/978-981-10-6614-6_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-6614-6_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6613-9
Online ISBN: 978-981-10-6614-6
eBook Packages: EngineeringEngineering (R0)