Skip to main content

A Novel Web Page Change Detection Technique for Migrating Crawlers

  • Conference paper
  • First Online:
Sensors and Image Processing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 651))

Abstract

Change in content of web documents is a constant process and this rate of change is different for different pages. This change must be updated at the search engine database else a user gets a superseded image of the web documents. Many methods for change detection have been developed that use tree based comparisons to decide whether two versions of a web document are same or not. But these methods are prone to high complexity and ambiguity. Also frequent crawler revisits results in increased pressure on Internet traffic and bandwidth usage. In this paper network efficient web page change detection technique for migrating crawlers is being proposed that effectively detects structural and content changes by comparing proposed tag and text code for each of the html tags contained in the web page. The proposed method performs well and is able to detect changes even at minute level while keeping the network load low.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chakravarthy, Hari H.S.C.: Automating change detection and notification of web pages. Proce. 17th Int. Conf. Database Expert Syst. Appl. (DEXA’06), IEEE, 0-7695-2641-1/06, (2006)

    Google Scholar 

  2. Yadav D., Sharma A.K, Gupta J.P: Topical web crawling using weighted anchor text and web page change detection techniques. 10th Int. Conf. Inform. Technol. IEEE, 0-7695-3068-0/07, 265–270, (2007)

    Google Scholar 

  3. Sharma AK, Dixit A.: Self-adjusting refresh time based architecture for incremental web crawler. Int J Comput Sci Network Secur(IJCSNS), 8(12), 349–54, (2008)

    Google Scholar 

  4. Gupta Ashlesha, Dixit Ashutosh.: Issues and Challenges in Effective Design of Search Engine. Int. J. Multi. Res. Studies, Dec (2012)

    Google Scholar 

  5. Artail H. and Abi-Aad M: An enhanced web page change detection approach based on limiting similarity computations to elements of same type, Springer Science + Business Media. LLC. pp. 1–21 (2007)

    Google Scholar 

  6. Yadav D., Sharma A.K.,Gupta J.P.: Parallel crawler architecture and web page change detection. WSEAS Trans. Comput. pp 929–940, (July 2008)

    Google Scholar 

  7. Goel S., Aggarwal R. R.: An efficient algorithm for web page change detection. Int. J. Comput. Appl. (0975—888), 48(10), 28–33, June (2012)

    Google Scholar 

  8. Wang Y., DeWitt D, Cai,J.: X-Diff: An Effective Change Detection Algorithm for XML Documents. Proc. 19th Int. Conf. Data Eng. pp. 519–30, (2003)

    Google Scholar 

  9. L. Su-bin, W.C. Shi, Z.H Liang, X.M.Yu, L. Zhang.: A direct web page templates detection method. IEEE Int. Conf. 978–1-4244-7255-0/11, (2011)

    Google Scholar 

  10. P. Ying, D. Xuhua.: Anomaly based web phishing page detection. Proc. 22nd Annu. Comput. Secur. Appl. Conf. IEEE, 0-7695-2716-7/06, (2006)

    Google Scholar 

  11. G. Ashlesha, Dixit, A., Sharma A.K.: Relevant document crawling with usage pattern and domain profile based page ranking. ISCON, 2013, IEEE International Conference held at GLA University, Mathura, (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashlesha Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Gupta, A., Dixit, A., Sharma, A.K. (2018). A Novel Web Page Change Detection Technique for Migrating Crawlers. In: Urooj, S., Virmani, J. (eds) Sensors and Image Processing. Advances in Intelligent Systems and Computing, vol 651. Springer, Singapore. https://doi.org/10.1007/978-981-10-6614-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6614-6_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6613-9

  • Online ISBN: 978-981-10-6614-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics