A Novel Architecture to Crawl Images Using OAI-PMH

Shruti Sharma; Parul Gupta; Nagpal, C. K.

doi:10.1007/978-981-10-6614-6_4

Shruti Sharma¹⁶,
Parul Gupta¹⁶ &
C. K. Nagpal¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 651))

833 Accesses

Abstract

With this evolving size of WWW, it is not possible for search engines to crawl and index the whole Web. Therefore, intelligent crawlers are needed that crawl only those sections of the WWW which contains preferred information. Image crawling is also such a technique that requires to crawl and index special type of information related to the images from the Web. Unlike traditional image crawlers, in this chapter, a novel method for crawling the images from web pages as well as from pdf documents has been proposed that not only uses metatags related to the image but also uses some important features that are related to the images. This proposed work also reports very promising outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Harmandas, S.M, Dunlop M.D.: Image retrieval by hypertext links proceeding of ACMSIGIR (1997)
Google Scholar
Marinheiro, R.N., Hall, W.: Expanding a Hypertext Information Retrieval System to Incorporate Multimedia Information. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences, Vol. II, (1998)
Google Scholar
Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)
Google Scholar
Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)
Google Scholar
Kherfi M.L., Brahmi D., Ziou D., Bernardi A.: Atlas WISE: A Web based Image Retrieval Engine (2003)
Google Scholar
Cho, J., Mukherjea, S.: Crawling Images on the Web, In Proceedings of Third International Conference on Visual Information Systems (Visual99), Amsterdam, The Netherlands, (1999)
Google Scholar
Kherfi M.L., Bernardi, A.: What is behind image retrieval from the WWW
Google Scholar
Kherfi M.L., Brahmi D., Ziou D.: Image Retrieval from the World Wide Web: Issues, Techniques, and Systems. ACM Comput. Surv. 36(1), 35–67 (2004)
Google Scholar
Remco, C.: Veltkamp, Mirela Tanase: Content-Based Image Retrieval Systems: A Survey (2000)
Google Scholar
http://www.openarchives.org
Sharma, S., Sharma, A.K., Gupta, J.P.: Exploring OAI-PMH: Open Archives Initiative Protocol for Metedata Harvesting, IJARCS (2010)
Google Scholar
Nelson M.L., Van de Sompel, H., Liu, X., Harrison, T., McFarland, N.: mod_oai: An Apache Module for Metadata Harvesting, In Proceedings of ECDL (2005)
Google Scholar
Mukherjea, S., Hirata, K., Hara, Y.: Towards a Multimedia World-Wide Web Information Retrieval Engine. In Proceedings of the Sixth International World-Wide Web Conference (1997)
Google Scholar
Jacso, P.: Tools for unearthing pdf files, Information Today, 48–49 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

YMCA University of Science and Technology, Faridabad, India
Shruti Sharma, Parul Gupta & C. K. Nagpal

Authors

Shruti Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Parul Gupta
View author publications
You can also search for this author in PubMed Google Scholar
C. K. Nagpal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shruti Sharma .

Editor information

Editors and Affiliations

Gautam Buddha University, Greater Noida, Uttar Pradesh, India
Shabana Urooj
Department of Electrical and Instrumentation Engineering, Thapar University, Patiala, Punjab, India
Jitendra Virmani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shruti Sharma, Parul Gupta, Nagpal, C.K. (2018). A Novel Architecture to Crawl Images Using OAI-PMH. In: Urooj, S., Virmani, J. (eds) Sensors and Image Processing. Advances in Intelligent Systems and Computing, vol 651. Springer, Singapore. https://doi.org/10.1007/978-981-10-6614-6_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-6614-6_4
Published: 04 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6613-9
Online ISBN: 978-981-10-6614-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics