Rule-Based Conditioning of Probabilistic Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11142)


Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).


Data cleaning Data integration Information extraction Probabilistic databases Probabilistic programming 


  1. 1.
    van Keulen, M.: Probabilistic data integration. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies, pp. 1–9. Springer, Heidelberg (2018)Google Scholar
  2. 2.
    van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)CrossRefGoogle Scholar
  3. 3.
    van Keulen, M.: Managing uncertainty: The road towards better data interoperability. IT - Inf. Technol. 54(3), 138–146 (2012)CrossRefGoogle Scholar
  4. 4.
    Magnani, M., Montesi, D.: A survey on uncertainty management in data integration. JDIQ 2(1), 5:1–5:33 (2010)CrossRefGoogle Scholar
  5. 5.
    Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)CrossRefGoogle Scholar
  6. 6.
    Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manage. 3(2), 1–180 (2011)CrossRefGoogle Scholar
  7. 7.
    Panse, F., van Keulen, M., Ritter, N.: Indeterministic handling of uncertain decisions in deduplication. JDIQ 4(2), 9:1–9:25 (2013)CrossRefGoogle Scholar
  8. 8.
    Wanders, B., van Keulen, M., van der Vet, P.: Uncertain groupings: probabilistic combination of grouping data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 236–250. Springer, Cham (2015). Scholar
  9. 9.
    Habib, M., Van Keulen, M.: TwitterNEED: a hybrid approach for named entity extraction and disambiguation for tweet. Nat. Lang. Eng. 22, 423–456 (2016)CrossRefGoogle Scholar
  10. 10.
    Raedt, L.D., Kimmig, A., Toivonen, H.: ProbLog: a probabilistic prolog and its application in link discovery. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2468–2473. AAAI Press (2007)Google Scholar
  11. 11.
    Olmedo, F., Gretz, F., Jansen, N., Kaminski, B.L., Katoen, J.P., Mciver, A.: Conditioning in probabilistic programming. ACM Trans. Program. Lang. Syst. 40(1), 4:1–4:50 (2018)CrossRefGoogle Scholar
  12. 12.
    Theobald, M., De Raedt, L., Dylla, M., Kimmig, A., Miliaraki, I.: 10 years of probabilistic querying – what next? In: Catania, B., Guerrini, G., Pokorný, J. (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 1–13. Springer, Heidelberg (2013). Scholar
  13. 13.
    Koch, C., Olteanu, D.: Conditioning probabilistic databases. Proc. VLDB Endow. 1(1), 313–325 (2008)CrossRefGoogle Scholar
  14. 14.
    van Keulen, M., Habib, M.: Handling uncertainty in information extraction. In: Proceedings of International Conference on Uncertainty Reasoning for the Semantic Web (URSW), vol. 778, pp. 109–112. CEUR-WS (2011)Google Scholar
  15. 15.
    Jayram, T.S., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar information extraction system. IEEE Data Eng. Bull. 29(1), 40–48 (2006)Google Scholar
  16. 16.
    Wanders, B., van Keulen, M.: Revisiting the formal foundation of probabilistic databases. In: Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology, IFSA-EUSFLAT, p. 47. Atlantis Press (2015)Google Scholar
  17. 17.
    Wanders, B., van Keulen, M., Flokstra, J.: JudgeD: a probabilistic datalog with dependencies. In: Proceedings of Workshop on Declarative Learning Based Programming, DeLBP, Number WS-16-07. AAAI (2016)Google Scholar
  18. 18.
    Fuhr, N.: Probabilistic datalog: a logic for powerful retrieval methods. In: International Conference on Research and Development in Information Retrieval (SIGIR), pp. 282–290. ACM (1995)Google Scholar
  19. 19.
    Ceri, S., Gottlob, G., Tanca, L.: Logic Programming and Databases. Springer, Heidelberg (1990). ISBN 3-540-51728-6CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of TwenteEnschedeThe Netherlands
  2. 2.RWTH AachenAachenGermany

Personalised recommendations