Skip to main content

Formulation of Composite Discrete Measures for Estimating Uncertainties in Probabilistic Databases

  • Conference paper
  • First Online:
Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety (BDAS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 928))

Abstract

The probabilistic databases contain large datasets embedded with noise and uncertainties in data association rules and queries. The data identification and interpretation in probabilistic databases require probabilistic models for data clustering and query processing. Thus, the associated probability measures are required to be heterogeneous as well as computable. This paper proposes a formal model of composite discrete measures in metric spaces intended to probabilistic databases. The proposed composite measures are computable and cover real as well as complex spaces. The spaces of discrete measures are constructed on continuous smooth functions. This paper presents construction of the formal model and computational evaluations of discrete measures following different functions having varying linearity and smoothness. Furthermore, a special monotone class of the composite discrete measure is presented using analytical formulation. The condensation measure of uniform contraction map is constructed. The proposed model can be employed to computationally estimate uncertainties in probabilistic databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barenboim, L., Elkin, M., Pettie, S., Schneider, J.: The locality of distributed symmetry breaking. J. ACM (JACM) 63(3), 20 (2016)

    Article  MathSciNet  Google Scholar 

  2. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 953–964. VLDB Endowment (2006)

    Google Scholar 

  3. Calude, C.S., Hertling, P.H., Jürgensen, H., Weihrauch, K.: Randomness on full shift spaces. Chaos, Solitons & Fractals 12(3), 491–503 (2001)

    Article  MathSciNet  Google Scholar 

  4. Chung, K.M., Pettie, S., Su, H.H.: Distributed algorithms for the Lovász local lemma and graph coloring. Distrib. Comput. 30(4), 261–280 (2017)

    Article  MathSciNet  Google Scholar 

  5. Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)

    Article  Google Scholar 

  6. Dubhashi, D., Grable, D.A., Panconesi, A.: Near-optimal, distributed edge colouring via the nibble method. Theor. Comput. Sci. 203(2), 225–252 (1998)

    Article  MathSciNet  Google Scholar 

  7. Edalat, A.: The Scott topology induces the weak topology. In: Proceedings of Eleventh Annual IEEE Symposium on Logic in Computer Science, LICS 1996, pp. 372–381. IEEE (1996)

    Google Scholar 

  8. Eifler, L.: Open mapping theorems for probability measures on metric spaces. Pac. J. Math. 66(1), 89–97 (1976)

    Article  MathSciNet  Google Scholar 

  9. Gács, P.: Uniform test of algorithmic randomness over a general space. Theor. Comput. Sci. 341(1–3), 91–137 (2005)

    Article  MathSciNet  Google Scholar 

  10. Haas, P., Jermaine, C.: Database meets simulation: tools and techniques. In: Proceedings of the 2009 INFORMS Simulation Society Research Workshop, Coventry, UK (2009)

    Google Scholar 

  11. Hertling, P., Weihrauch, K.: Randomness spaces. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 796–807. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055103

    Chapter  Google Scholar 

  12. Horváth, Á.: Normally distributed probability measure on the metric space of norms. Acta Mathematica Scientia 33(5), 1231–1242 (2013)

    Article  MathSciNet  Google Scholar 

  13. Hoyrup, M., Rojas, C.: Computability of probability measures and Martin-Löf randomness over metric spaces. Inf. Comput. 207(7), 830–847 (2009)

    Article  MathSciNet  Google Scholar 

  14. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: MCDB: a Monte Carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 687–700. ACM (2008)

    Google Scholar 

  15. Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)

    Article  Google Scholar 

  16. Jibrin, S., Boneh, A., Caron, R.J.: Probabilistic algorithms for extreme point identification. J. Interdiscip. Math. 10(1), 131–142 (2007)

    Article  MathSciNet  Google Scholar 

  17. Karp, R.M.: An introduction to randomized algorithms. Discret. Appl. Math. 34(1–3), 165–201 (1991)

    Article  MathSciNet  Google Scholar 

  18. Lassaigne, R., Peyronnet, S.: Probabilistic verification and approximation. Ann. Pure Appl. Log. 152(1–3), 122–131 (2008)

    Article  MathSciNet  Google Scholar 

  19. Myers, R.B., Herskovic, J.R.: Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses. J. Biomed. Inform. 44, S69–S77 (2011)

    Article  Google Scholar 

  20. Newcombe, H.B.: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business. Oxford University Press Inc., Oxford (1988)

    Google Scholar 

  21. Nie, L., Li, Z., Qu, W.: Association rules discovery via approximate method from probabilistic database. In: Trustcom/BigDataSE/I SPA 2016 IEEE, pp. 909–914. IEEE (2016)

    Google Scholar 

  22. Norman, G.: Analysing randomized distributed algorithms. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 384–418. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24611-4_11

    Chapter  Google Scholar 

  23. Parthasarathy, K.R.: Probability Measures on Metric Spaces, vol. 352. American Mathematical Society, Providence (2005)

    MATH  Google Scholar 

  24. Repovš, D., Savchenko, A., Zarichnyi, M.: Fuzzy Prokhorov metric on the set of probability measures. Fuzzy Sets Syst. 175(1), 96–104 (2011)

    Article  MathSciNet  Google Scholar 

  25. Van Breugel, F., Worrell, J.: A behavioural pseudometric for probabilistic transition systems. Theor. Comput. Sci. 331(1), 115–142 (2005)

    Article  MathSciNet  Google Scholar 

  26. Vovk, V., Shen, A.: Prequential randomness and probability. Theor. Comput. Sci. 411(29–30), 2632–2646 (2010)

    Article  MathSciNet  Google Scholar 

  27. Zhu, Y., Matsuyama, Y., Ohashi, Y., Setoguchi, S.: When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. J. Biomed. Inform. 56, 80–86 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susmit Bagchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bagchi, S. (2018). Formulation of Composite Discrete Measures for Estimating Uncertainties in Probabilistic Databases. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99987-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99986-9

  • Online ISBN: 978-3-319-99987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics