Abstract
The probabilistic databases contain large datasets embedded with noise and uncertainties in data association rules and queries. The data identification and interpretation in probabilistic databases require probabilistic models for data clustering and query processing. Thus, the associated probability measures are required to be heterogeneous as well as computable. This paper proposes a formal model of composite discrete measures in metric spaces intended to probabilistic databases. The proposed composite measures are computable and cover real as well as complex spaces. The spaces of discrete measures are constructed on continuous smooth functions. This paper presents construction of the formal model and computational evaluations of discrete measures following different functions having varying linearity and smoothness. Furthermore, a special monotone class of the composite discrete measure is presented using analytical formulation. The condensation measure of uniform contraction map is constructed. The proposed model can be employed to computationally estimate uncertainties in probabilistic databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barenboim, L., Elkin, M., Pettie, S., Schneider, J.: The locality of distributed symmetry breaking. J. ACM (JACM) 63(3), 20 (2016)
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 953–964. VLDB Endowment (2006)
Calude, C.S., Hertling, P.H., Jürgensen, H., Weihrauch, K.: Randomness on full shift spaces. Chaos, Solitons & Fractals 12(3), 491–503 (2001)
Chung, K.M., Pettie, S., Su, H.H.: Distributed algorithms for the Lovász local lemma and graph coloring. Distrib. Comput. 30(4), 261–280 (2017)
Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)
Dubhashi, D., Grable, D.A., Panconesi, A.: Near-optimal, distributed edge colouring via the nibble method. Theor. Comput. Sci. 203(2), 225–252 (1998)
Edalat, A.: The Scott topology induces the weak topology. In: Proceedings of Eleventh Annual IEEE Symposium on Logic in Computer Science, LICS 1996, pp. 372–381. IEEE (1996)
Eifler, L.: Open mapping theorems for probability measures on metric spaces. Pac. J. Math. 66(1), 89–97 (1976)
Gács, P.: Uniform test of algorithmic randomness over a general space. Theor. Comput. Sci. 341(1–3), 91–137 (2005)
Haas, P., Jermaine, C.: Database meets simulation: tools and techniques. In: Proceedings of the 2009 INFORMS Simulation Society Research Workshop, Coventry, UK (2009)
Hertling, P., Weihrauch, K.: Randomness spaces. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 796–807. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055103
Horváth, Á.: Normally distributed probability measure on the metric space of norms. Acta Mathematica Scientia 33(5), 1231–1242 (2013)
Hoyrup, M., Rojas, C.: Computability of probability measures and Martin-Löf randomness over metric spaces. Inf. Comput. 207(7), 830–847 (2009)
Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: MCDB: a Monte Carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 687–700. ACM (2008)
Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)
Jibrin, S., Boneh, A., Caron, R.J.: Probabilistic algorithms for extreme point identification. J. Interdiscip. Math. 10(1), 131–142 (2007)
Karp, R.M.: An introduction to randomized algorithms. Discret. Appl. Math. 34(1–3), 165–201 (1991)
Lassaigne, R., Peyronnet, S.: Probabilistic verification and approximation. Ann. Pure Appl. Log. 152(1–3), 122–131 (2008)
Myers, R.B., Herskovic, J.R.: Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses. J. Biomed. Inform. 44, S69–S77 (2011)
Newcombe, H.B.: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business. Oxford University Press Inc., Oxford (1988)
Nie, L., Li, Z., Qu, W.: Association rules discovery via approximate method from probabilistic database. In: Trustcom/BigDataSE/I SPA 2016 IEEE, pp. 909–914. IEEE (2016)
Norman, G.: Analysing randomized distributed algorithms. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 384–418. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24611-4_11
Parthasarathy, K.R.: Probability Measures on Metric Spaces, vol. 352. American Mathematical Society, Providence (2005)
Repovš, D., Savchenko, A., Zarichnyi, M.: Fuzzy Prokhorov metric on the set of probability measures. Fuzzy Sets Syst. 175(1), 96–104 (2011)
Van Breugel, F., Worrell, J.: A behavioural pseudometric for probabilistic transition systems. Theor. Comput. Sci. 331(1), 115–142 (2005)
Vovk, V., Shen, A.: Prequential randomness and probability. Theor. Comput. Sci. 411(29–30), 2632–2646 (2010)
Zhu, Y., Matsuyama, Y., Ohashi, Y., Setoguchi, S.: When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. J. Biomed. Inform. 56, 80–86 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Bagchi, S. (2018). Formulation of Composite Discrete Measures for Estimating Uncertainties in Probabilistic Databases. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-99987-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)