Automated Organic Web Harvesting on Web Data for Analytics

Jacob, Lija; Thomas, K. T.

doi:10.1007/978-981-16-4486-3_14

Lija Jacob¹⁴ &
K. T. Thomas¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 290))

514 Accesses

Abstract

Automated Web search and web data extraction has become an inevitable part of research in the area of web mining. The web scraping has immense influence on ecommerce, market research, web indexing and much more. Most of the web information is presented in an unstructured or free format. Web scraping helps every user to retrieve, analyze and use the data suitably according to their requirement. There exist different methodologies for web scraping. Major web scraping tools are rule based systems. In the proposed work, an automated method for web information extraction using Computer Vision is proposed and developed. The proposed automated web scraping method comprises of automated URL extraction virtual extraction of required data and storing the data in a structured format which is useful in market research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Web Scraping: From Tools to Related Legislation and Implementation Using Python

Data Extraction Based on Web Scrapy

A Best Price Web Scraping Application for E-commerce Websites

References

https://www.webharvy.com/articles/what-is-web-scraping.html
https://wscraper.com/what-is-data-harvesting-and-how-to-prevent-it/
Ashiwal P, Tandan SR, Tripathi P, Miri R (2016) Web information retrieval using python and beautifulsoup. Int J Res Appl Sci Eng Technol 4(VI). ISSN: 2321–9653
Google Scholar
Peterson A (2021) BeautifulSoup: Web Scraping with Python
Google Scholar
https://www.shieldsquare.com/what-are-the-different-scraping-techniques/
https://towardsdatascience.com/https-medium-com-hiren787-patel-web-scraping-applications-a6f370d316f4
https://www.import.io/post/web-scraping-explained/
Sirisurya S (2015) A comparative study on web scraping. In: Proceedings of 8th International Research Conference, KDU
Google Scholar
https://www.analyticsvidhya.com/blog/2020/04/5-popular-python-libraries-web-scraping
https://yoast.com/what-is-a-snippet/
https://pypi.org/project/pytesseract/
Liu W, Meng X, Meng W (2010) ViDE: a vision-based approach for deep web data extraction. IEEE Trans Knowl Data Eng 22:447–460. https://doi.org/10.1109/TKDE.2009.109
Article Google Scholar

Download references

Author information

Authors and Affiliations

Christ University, Bangalore, India
Lija Jacob & K. T. Thomas

Authors

Lija Jacob
View author publications
You can also search for this author in PubMed Google Scholar
K. T. Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lija Jacob .

Editor information

Editors and Affiliations

Christ University, Bangalore, India
Samiksha Shukla
Digital Monozukuri, Stanford Alumni, Palo Alto, CA, USA
Aynur Unal
Christ University, Bangalore, India
Joseph Varghese Kureethara
Computer Science and Engineering, Sri Aurobindo Institute of Technology, Indore, Madhya Pradesh, India
Durgesh Kumar Mishra
School of Electronics Engineering, Kyungpook National University, Daegu, Korea (Republic of)
Dong Seog Han

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jacob, L., Thomas, K.T. (2021). Automated Organic Web Harvesting on Web Data for Analytics. In: Shukla, S., Unal, A., Kureethara, J.V., Mishra, D.K., Han, D.S. (eds) Data Science and Security. Lecture Notes in Networks and Systems, vol 290. Springer, Singapore. https://doi.org/10.1007/978-981-16-4486-3_14

Download citation

DOI: https://doi.org/10.1007/978-981-16-4486-3_14
Published: 27 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4485-6
Online ISBN: 978-981-16-4486-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automated Organic Web Harvesting on Web Data for Analytics

Abstract

Access this chapter

Similar content being viewed by others

Web Scraping: From Tools to Related Legislation and Implementation Using Python

Data Extraction Based on Web Scrapy

A Best Price Web Scraping Application for E-commerce Websites

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automated Organic Web Harvesting on Web Data for Analytics

Abstract

Access this chapter

Similar content being viewed by others

Web Scraping: From Tools to Related Legislation and Implementation Using Python

Data Extraction Based on Web Scrapy

A Best Price Web Scraping Application for E-commerce Websites

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation