Meaningful Change Detection on the Web⋆
In this paper we present a new technique for detecting changes on the Web. We propose a new method to measure the similarity of two documents, that can be efficiently used to discover changes in selected portions of the original document. The proposed technique has been implemented in the CDWeb system providing a change monitoring service on theWeb. CDWeb differs from other previously proposed systems since it allows the detection of changes on portions of documents and specific changes expressed by means of complex conditions, i.e. users might want to know if the value of a given stock has increased by more than 10%. Several tests on stock exchange and auction web pages proved the effectiveness of the proposed approach.
KeywordsTarget Zone Document Tree Edit Mapping Cisco System Unordered Tree
Unable to display preview. Download preview PDF.
- 1.S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom Change detection in hierarchically structured information. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 493–504, Montreal, Quebec, June 1996.Google Scholar
- 2.S. Chawathe, H. Garcia-Molina Meaningful change detection in structured data. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 26–37, Tuscon, Arizona, May 1997.Google Scholar
- 3.S. Chawathe, S. Abiteboul, J. Widom Representing and querying changes in semistructured data. In Proc. of the Int. Conf. on Data Engeneering, pages 4–13, Orlando, Florida, February 1998Google Scholar
- 4.F. Douglis, T. Ball, Y. Chen, E. Koutsofios WebGuide: Querying and Navigating Changes in Web Repositories. In WWW5 / Computer Networks, 28(7-11), pages 1335–1344, 1996.Google Scholar
- 5.Fred Douglis, Thomas Ball: Tracking and Viewing Changes on the Web. In Proc. of USENIX Annual Technical Conference, pages 165–176, 1996.Google Scholar
- 7.L. Liu, C. Pu, W. Tang, J. Biggs, D. Buttler, W. Han, P. Benninghoff, and Fenghua. CQ: A personalized update monitoring toolkit. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1998Google Scholar
- 8.L. Liu, C. Pu, W. Tang WebCQ-Detecting and delivering information changes on the web. In Proc. of CIKM’00, Washington, DC USA, 2000.Google Scholar
- 9.NetMind. http://www.netmind.com
- 10.TracerLock. http://www.peacefire.org/tracerlock
- 13.Webwhacker. http://www.webwhacker.com
- 14.J. Widom and J. Ullman. C 3: Changes, consistency, and configurations in heterogeneous distributed information systems. Unpublished, available at http://wwwdb.stanford.edu/c3/synopsis.html, 1995
- 15.K. Zhang, J.T. Wang and D. Shasha. On the Editing Distance between Undirected Acyclic Graphs and Related Problems. In Proc. of Combinatorial Pattern Matching, pp. 395–407, 1995.Google Scholar