On Efficient Matching of Streaming XML Documents and Queries
Applications such as online shopping, e-commerce, and supply-chain management require the ability to manage large sets of specifications of products and/or services as well as of consumer requirements, and call for efficient matching of requirements to specifications.
Requirements are best viewed as “queries” and specifications as data, often represented in XML. We present a framework where requirements and specifications are both registered with and are maintained by a registry. On a periodical basis, the registry matches new incoming specifications, e.g., of products and services, against requirements, and notifies the owners of the requirements of matches found. This problem is dual to the conventional problem of database query processing in that the size of data (e.g., a document that is streaming by) is quite small compared to the number of registered queries (which can be very large). For performing matches efficiently, we propose the notion of a “requirements index”, a notion that is dual to a traditional index. We provide efficient matching algorithms that use the proposed indexes. Our prototype MatchMaker system implementation uses our requirements index-based matching algorithms as a core and provides timely notification service to registered users. We illustrate the effectiveness and scalability of the techniques developed with a detailed set of experiments.
KeywordsTree Label Continuous Query Distinguished Node Label Algorithm Zipf Distribution
Unable to display preview. Download preview PDF.
- 1.Mehmet Altinel, Michael J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In Proc. VLDB, 2000.Google Scholar
- 3.Chee-Yong Chan, Pascal Felber, Minos Garofalakis, and Rajeev Rastogi. Efficient Filtering of XML Documents with XPath Expressions. Proc. ICDE, San Jose, CA, Feb. 2002. To appear.Google Scholar
- 4.J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query System for Internet Dtatabases. In ACM SIGMOD, May 2000.Google Scholar
- 5.A.L. Diaz and D. Lovell. XML Generator. http://www.alphaworks/ibm.com/tech/xmlgnerator, Sept. 1999.
- 6.Françoise Fabret, Hans-Arno Jacobsen, François LLirbat, João Pereira, Kenneth A. Ross, and Dennis Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD, May 2001.Google Scholar
- 7.F.B Fabret et al. Efficient matching for content-based publish/subscribe systems. In Proc. CoopIS, 2000.Google Scholar
- 8.Hector Garcia-Molina A. Crespo, and O. Buyukkokten. Efficient Query subscription Processing in a Multicast Environment. In Proc. ICDE, 2000.Google Scholar
- 10.Himanshu Gupta and Divesh Srivastava. Data Warehouse of Newsgroups. In Proc. ICDT, 1999.Google Scholar
- 11.Eric N. Hanson Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. Scalable Trigger Processing. In Proc. ICDE, pages 266–275, April 1999.Google Scholar
- 12.The Intel Corporation. Intel Netstructure XML Accelerators. http://www.intel.com/netstructure/products/xml_accelerators.htm, 2000.
- 13.H.V. Jagadish, Laks V.S. Lakshmanan, Divesh Srivastava, and Keith Thompson. TAX: A Tree Algebra for XML. Proc. DBPL, Roma, Italy, September 2001.Google Scholar
- 14.Laks V.S. Lakshmanan and P. Sailaja. On Efficient Matching of Streaming XML Documents and Queries. Tech. Report, Univ. Of British Columbia, December 2001. http://www.cs.ubc.ca/laks/matchmaker-edbt02-full.ps.gz.
- 15.Laks V.S. Lakshmanan and P. Sailaja. MatchMaker: A system for matching XML documents and queries. Demo paper, Proc. ICDE, San Jose, CA, Feb. 2002. To appear.Google Scholar
- 17.Benjamin Nguyen, Serge Abiteboul, Gregory Cobena, and Mihai Preda. Monitoring XML Data on the Web. ACM SIGMOD, 2001.Google Scholar
- 18.Douglas Terry David Goldberg, David Nichols, and Brian Oke. Continuous queries over Append-only databases. In ACM SIGMOD, June 1992.Google Scholar