A Framework for Sampling-Based XML Data Pricing

Tang, Ruiming; Amarilli, Antoine; Senellart, Pierre; Bressan, Stéphane

doi:10.1007/978-3-662-49214-7_4

Ruiming Tang¹⁹,
Antoine Amarilli²⁰,
Pierre Senellart^19,20 &
…
Stéphane Bressan¹⁹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9510))

649 Accesses

Abstract

While price and data quality should define the major trade-off for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension.

In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document.

We present a pseudo-polynomial time algorithm to select a rooted subtree with prescribed weight uniformly at random, but show that this problem is unfortunately intractable. Yet, we are able to identify several practical cases where our algorithm runs in polynomial time. The first case is uniform random sampling of a rooted subtree with prescribed size rather than weights; the second case restricts to binary weights.

As a more challenging scenario for the sampling problem, we also study the uniform sampling of a rooted subtree of prescribed weight and prescribed height. We adapt our pseudo-polynomial time algorithm to this setting and identify tractable cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: PODS (2009)
Google Scholar
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. Comput. Netw. 33(1–6), 295–308 (2000)
Article Google Scholar
Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: ICDM (2008)
Google Scholar
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Query-based data pricing. In: PODS (2012)
Google Scholar
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: QueryMarket demonstration: pricing for online data markets. PVLDB 5(12), 1962–1965 (2012)
Google Scholar
Koutris, P., Upadhyaya, P., Balazinska, M., Howe, B., Suciu, D.: Toward practical query pricing with QueryMarket. In: SIGMOD (2013)
Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)
Google Scholar
Li, C., Li, D.Y., Miklau, G., Suciu, D.: A theory of pricing private data. In: ICDT (2013)
Google Scholar
Li, C., Miklau, G.: Pricing aggregate queries in a data marketplace. In: WebDB (2012)
Google Scholar
Lin, B.-R., Kifer, D.: On arbitrage-free pricing for general data queries. PVLDB 7(9), 757–768 (2014)
Google Scholar
Lu, X., Bressan, S.: Sampling connected induced subgraphs uniformly at random. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 195–212. Springer, Heidelberg (2012)
Chapter Google Scholar
Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation. In: EDBT (2009)
Google Scholar
Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: WWW (2010)
Google Scholar
Muschalle, A., Stahl, F., Löser, A., Vossen, G.: Pricing approaches for data markets. In: Castellanos, M., Dayal, U., Rundensteiner, E.A. (eds.) BIRTE 2012. LNBIP, vol. 154, pp. 129–144. Springer, Heidelberg (2013)
Chapter Google Scholar
Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 75(4), 211–218 (2002)
Article Google Scholar
Ribeiro, B.F., Towsley, D.F.: Estimating and sampling graphs with multidimensional random walks. In: Internet Measurement Conference (2010)
Google Scholar
Tang, R., Amarilli, A., Senellart, P., Bressan, S.: Get a sample for a discount. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part I. LNCS, vol. 8644, pp. 20–34. Springer, Heidelberg (2014)
Google Scholar
Tang, R., Shao, D., Bressan, S., Valduriez, P.: What you pay for is what you get. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 395–409. Springer, Heidelberg (2013)
Chapter Google Scholar
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)
Article MATH Google Scholar
Wang, W., Jiang, H., Lu, H., Yu, J.X. Containment join size estimation: models and methods. In: SIGMOD (2003)
Google Scholar

Download references

Acknowledgments

This work is supported by the French Ministry of Foreign Affairs under the STIC-Asia program, CCIPX project.

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Ruiming Tang, Pierre Senellart & Stéphane Bressan
Institut Mines–Télécom, Télécom ParisTech, CNRS LTCI, Paris, France
Antoine Amarilli & Pierre Senellart

Authors

Ruiming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Amarilli
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Senellart
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Bressan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Senellart .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker
Czech Technical University, Prague, Czech Republic
Lenka Lhotska
University of Auckland, Auckland, New Zealand
Sebastian Link

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tang, R., Amarilli, A., Senellart, P., Bressan, S. (2016). A Framework for Sampling-Based XML Data Pricing. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. Lecture Notes in Computer Science(), vol 9510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49214-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-49214-7_4
Published: 07 January 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49213-0
Online ISBN: 978-3-662-49214-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics