Abstract
In this era of big data, the operational and user-related data is generated in huge scales or dimensions, by user platforms, high-end algorithms and computing devices. This data is an essential ‘asset’ for the organization/individual for diverse analytics. The recent plethora of data raised immense challenges and opportunities to a data-driven organization. Data valuation of potential data objects in a prodigious data/dataset is one such co-occurring and multifaceted task, due to inherent characteristics/features of data objects and lack of a global measure or mechanism to evaluate. A data valuation scheme assists the organizations to rank/outline or weighting the potential data objects for a computational objective. In this paper, we have explored the fundamentals aspects of traditional data valuation approaches to investigate the evolution in existing techniques and implicit aspects. In this process, an automated data-evaluation strategy is proposed. The strategy evaluates the values of data objects based on the assessment of user queries and ranked attributes of a target database. The key contribution of the work is its capability to evaluate the data value a desired granularity level, e.g. attribute level, tuple level, record level, etc., on just-in-time basis for the buyer/consumer. Each data objects will is assigned with rank values and could be adapted by several consumer/buyer. The paper also asserts the design challenges and issues for the development of the similar approaches in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, D. Suciu, Query-based data pricing. J. ACM (JACM) 62(5), 1–44 (2015)
A. Ginart, M. Guan, G. Valiant, J.Y. Zou, Making AI forget you: data deletion in machine learning, in Advances in Neural Information Processing Systems, pp. 3513–3526 (2019)
A. Ghorbani, Zou, J., Data Shapley: Equitable valuation of data for machine learning (2019). arXiv preprint arXiv:1904.02868
A.J. Myles, A.F. Murray, A.R. Wallace, J. Barnard, G. Smith, Estimating MLP generalization ability without a test set using fast, approximate leave-one-out cross-validation. Neural Comput. Appl. 5(3), 134–151 (1997)
J. Liu, Y. Tan, Estimating the leave-one-out error for support vector regression, in 2005 International Conference on Neural Networks and Brain. IEEE (2005, October), Vol. 1, pp. 208–213
R. Jia, D. Dao, B. Wang, F.A. Hubis, N.M. Gurel, B. Li et al., Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endowment 12(11), 1610–1623 (2019)
D.R. Valz, U.S. Patent No. 9,076,148. U.S. Patent and Trademark Office, Washington, DC (2015)
H.T. Lam, J.M. Thiebaut, M. Sinn, B. Chen, T. Mai, O. Alkan, One button machine for automating feature engineering in relational databases (2017). arXiv preprint arXiv:1706.00327
P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, D. Suciu, Toward practical query pricing with QueryMarket, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013, June), pp. 613–624
C. Li, D.Y. Li, G. Miklau, D. Suciu, A theory of pricing private data. ACM Trans. Database Syst. (TODS) 39(4), 1–28 (2014)
J.M. Kanter, K. Veeramachaneni, Deep feature synthesis: towards automating data science endeavors, in IEEE International Conference on Data Science and Advanced Analytics (DSAA). 36678 2015. IEEE (2015), pp. 1–10
S. Deep, P. Koutris, QIRANA: a framework for scalable query pricing, in Proceedings of the 2017 ACM International Conference on Management of Data (2017, May), pp. 699–713
S. Sathananthan, Data valuation considering knowledge transformation, process models and data models, in 2018 12th International Conference on Research Challenges in Information Science (RCIS). IEEE (2018, May), pp. 1–5
R. Tang, D. Shao, S. Bressan, P. Valduriez, What you pay for is what you get. In International Conference on Database and Expert Systems Applications (Springer, Berlin, Heidelberg, 2013, August), pp. 395–409
B.R. Lin, D. Kifer, On arbitrage-free pricing for general data queries. Proc. VLDB Endowment 7(9), 757–768 (2014)
M. Balazinska, B. Howe, P. Koutris, D. Suciu, P. Upadhyaya, A discussion on pricing relational data, in In Search of Elegance in the Theory and Practice of Computation (Springer, Berlin, Heidelberg, 2013), pp. 167–173
U. Khurana, D. Turaga, H. Samulowitz, S. Parthasarathy (eds.) Cognito: Automated Feature Engineering for Supervised Learning (ICDM, 2016)
V. Kassarnig, F. Wotawa, An approach to automatically extract predictive properties from nominal attributes in relational databases, in 2018 IEEE International Conference on Big Data (Big Data). IEEE (2018, December), pp. 4932–4939
A. Fatima, F.A. Khan, A. Raza, A.B. Kamran, Automated feature synthesis from relational database for data science related problems, in 2018 International Conference on Frontiers of Information Technology (FIT). IEEE (2018, December), pp. 71–75
Y. Chen, Information valuation for information lifecycle management. In Second International Conference on Autonomic Computing (ICAC'05). IEEE (2005, June), pp. 135–146
P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, D. Suciu, Querymarket demonstration: pricing for online data markets. Proc. VLDB Endowment 5(12), 1962–1965 (2012)
H. Yu, M. Zhang, Data pricing strategy based on data quality. Comput. Ind. Eng. 112, 1–10 (2017)
J. Yoon, S.O. Arik, T. Pfister, Data valuation using reinforcement learning (2019). arXiv preprint arXiv:1909.11671.
S. Hara, A. Nitanda, T. Maehara, Data cleansing for models trained with SGD, in Advances in Neural Information Processing Systems (pp. 4215–4224) (2019)
L. Zhu, S.O. Arik, Y. Yang, T. Pfister, Learning to transfer learn (2019). arXiv preprint arXiv:1908.11406
R. Jia, X. Sun, J. Xu, C. Zhang, B. Li, D. Song, An empirical and comparative analysis of data valuation with scalable algorithms (2019). arXiv preprint arXiv:1911.07128
N. Vincent, Y. Li, R. Zha, B. Hecht, Mapping the potential and pitfalls of “data dividends” as a means of sharing the profits of artificial intelligence (2019). arXiv preprint arXiv:1912.00757
R.C. Fernandez, P. Subramaniam, M.J. Franklin, Data market platforms trading data assets to solve data problems [Vision Paper] (2020). arXiv preprint arXiv:2002.01047
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Verma, N., Singh, V. (2022). Query-Based Data Valuation Strategy: An Exploratory View. In: Saraswat, M., Roy, S., Chowdhury, C., Gandomi, A.H. (eds) Proceedings of International Conference on Data Science and Applications . Lecture Notes in Networks and Systems, vol 288. Springer, Singapore. https://doi.org/10.1007/978-981-16-5120-5_52
Download citation
DOI: https://doi.org/10.1007/978-981-16-5120-5_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5119-9
Online ISBN: 978-981-16-5120-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)