Skip to main content
Log in

Incremental knowledge base construction using DeepDive

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Populating a database with information from unstructured sources—also known as knowledge base construction (KBC)—is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based, respectively, on sampling and variational techniques. We also study the trade-off space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate DeepDive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://deepdive.stanford.edu.

  2. DeepDive has some technical differences from Markov Logic that we have found useful in building applications. We discuss these differences in Sect. 4.2.

  3. http://www.itl.nist.gov/iad/mig/tests/ace/2000/.

  4. http://www.freebase.com/.

  5. http://macrostrat.org/.

  6. There is a justification for probabilistic reasoning as Cox’s theorem asserts (roughly) that if one uses numbers as degrees of belief, then one must either use probabilistic reasoning or risk contradictions in one’s reasoning system, i.e., a probabilistic framework is the only sound system for reasoning in this manner. We refer the reader to Jaynes [34].

  7. For more information, including examples, please see http://deepdive.stanford.edu. Note that our engine is built on PostgreSQL and Greenplum for all SQL processing and UDFs. There is also a port to MySQL.

  8. Our system Tuffy introduced this feature to MLNs, but its semantics had not been described in the literature.

  9. For example, for the grounding procedure illustrated in Fig. 5, the delta rule for F1 is \(q^{\delta }(x) :- R^{\delta }(x,y)\).

  10. Compared with running inference from scratch, the strawman approach does not materialize any factors. Therefore, it is necessary for strawman to enumerate each possible world and save their probability because we do not know a priori which possible world will be used in the inference phase.

  11. In Fig. 7, the numbers are reported for a factor graph whose factor weights are sampled at random from \([-0.5,0.5]\). We also experimented with different intervals (\([-0.1,0.1]\), \([-1,1]\), \([-10,10]\)), but these had no impact on the trade-off

References

  1. Acar, U.A., Ihler, A.T., Mettu, R.R., Sümer, Ö.: Adaptive inference on general graphical models. In: UAI 2008, Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence, Helsinki, Finland, July 9–12, 2008, pp. 1–8 (2008)

  2. Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50(1–2), 5–43 (2003)

    Article  MATH  Google Scholar 

  3. Angeli, G., Gupta, S., Jose, M., Manning, C.D., Ré, C., Tibshirani, J., Wu, J.Y., Wu, S., Zhang, C.: Stanford’s 2014 slot filling systems. TAC KBP (2014)

  4. Banerjee, O., Ghaoui, L.E., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)

    MathSciNet  MATH  Google Scholar 

  5. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6–12, 2007, pp. 2670–2676 (2007)

  6. Barbosa, D., Wang, H., Yu, C.: Shallow information extraction for the knowledge web. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8–12, 2013, pp. 1264–1267 (2013)

  7. Betteridge, J., Carlson, A., Hong, S.A., Jr., E.R.H., Law, E.L.M., Mitchell, T.M., Wang, S.H.: Toward never ending language learning. In: Learning by Reading and Learning to Read, Papers from the 2009 AAAI Spring Symposium, Technical Report SS-09-07, Stanford, California, USA, March 23–25, 2009, pp. 1–2 (2009)

  8. Brin, S.: Extracting patterns and relations from the world wide web. In: The World Wide Web and Databases, International Workshop WebDB’98, Valencia, Spain, March 27–28, 1998, Selected Papers, pp. 172–183 (1998)

  9. Brown, E., Epstein, E., Murdock, J.W., Fin, T.H.: Tools and methods for building watson. IBM Research Report (2013)

  10. Bunescu, R.C., Mooney, R.J.: Learning to extract relations from the web using minimal supervision. In: ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic (2007)

  11. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E.R.H., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11–15, 2010 (2010)

  12. Chen, F., Doan, A., Yang, J., Ramakrishnan, R.: Efficient information extraction over evolving text data. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7–12, 2008, Cancún, México, pp. 943–952 (2008)

  13. Chen, F., Feng, X., Re, C., Wang, M.: Optimizing statistical information extraction programs over evolving text. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1–5 April, 2012, pp. 870–881 (2012)

  14. Chen, Y., Wang, D.Z.: Knowledge expansion over probabilistic knowledge bases. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, pp. 649–660 (2014)

  15. Chirkova, R., Yang, J.: Materialized views. Found. Trends Databases 4(4), 295–405 (2012)

    Article  Google Scholar 

  16. Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, August 6–10, 1999, Heidelberg, Germany, pp. 77–86 (1999)

  17. Dalvi, N.N., Suciu, D.: The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM 59(6), 30 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Delcher, A.L., Grove, A.J., Kasif, S., Pearl, J.: Logarithmic-time updates and queries in probabilistic networks. J. Artif. Intell. Res. JAIR 4, 37–59 (1996)

    MathSciNet  MATH  Google Scholar 

  19. den Hollander, F.: Probability Theory: The Coupling Method. http://websites.math.leidenuniv.nl/probability/lecturenotes/CouplingLectures.pdf (2012)

  20. Domingos, P.M., Lowd, D.: Markov logic: an interface layer for artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. (2009)

  21. Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K., Sun, S., Zhang, W.: From data fusion to knowledge fusion. PVLDB 7(10), 881–892 (2014)

    Google Scholar 

  22. Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, May 17–20, 2004, pp. 100–110 (2004)

  23. Fan, M., Zhao, D., Zhou, Q., Liu, Z., Zheng, T.F., Chang, E.Y.: Distant supervision for relation extraction with matrix completion. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA, Volume 1: Long Papers, pp. 839–849 (2014)

  24. Ferrucci, D.A., Brown, E.W., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J.M., Schlaefer, N., Welty, C.A.: Building watson: an overview of the deepqa project. AI Mag 31(3), 59–79 (2010)

    Google Scholar 

  25. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Article  MATH  Google Scholar 

  26. Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The lixto data extraction project—back and forth between theory and practice. In: Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14–16, 2004, Paris, France, pp. 1–12 (2004)

  27. Govindaraju, V., et al.: Understanding tables in context using standard NLP toolkits. In: ACL (2013)

  28. Gupta, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)

    Google Scholar 

  29. Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, May 26–28, 1993, pp. 157–166 (1993)

  30. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: 14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, August 23–28, 1992, pp. 539–545 (1992)

  31. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, pp. 541–550 (2011)

  32. Hoffmann, R., et al.: Learning 5000 relational extractors. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 286–295. Association for Computational Linguistics (2010)

  33. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C.M., Haas, P.J.: MCDB: a monte carlo approach to managing uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp. 687–700 (2008)

  34. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  35. Jerrum, M., Sinclair, A.: Polynomial-time approximation algorithms for the ising model. SIAM J. Comput. 22(5), 1087–1116 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  36. Jiang, S., Lowd, D., Dou, D.: Learning to refine an automatically extracted knowledge base using markov logic. In: 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10–13, 2012, pp. 912–917 (2012)

  37. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA, Volume 1: Long Papers, pp. 655–665 (2014)

  38. Kasneci, G., Ramanath, M., Suchanek, F.M., Weikum, G.: The YAGO-NAGA approach to knowledge discovery. SIGMOD Rec. 37(4), 41–47 (2008)

    Article  Google Scholar 

  39. Katakis, I., Tsoumakas, G., Banos, E., Bassiliades, N., Vlahavas, I.P.: An adaptive personalized news dissemination system. J. Intell. Inf. Syst. 32(2), 191–212 (2009)

    Article  Google Scholar 

  40. Koc, M.L., Ré, C.: Incrementally maintaining classification using an RDBMS. PVLDB 4(5), 302–313 (2011)

    Google Scholar 

  41. Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: The Semantic Web—ISWC 2012—11th International Semantic Web Conference, Boston, MA, USA, November 11–15, 2012, Proceedings, Part I, pp. 263–278 (2012)

  42. Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence (2006)

    Google Scholar 

  43. Li, J., Ritter, A., Hovy, E.H.: Weakly supervised user profile extraction from twitter. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA, Volume 1: Long Papers, pp. 165–174 (2014)

  44. Li, Y., Reiss, F., Chiticariu, L.: Systemt: a declarative information extraction system. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA—System Demonstrations, pp. 109–114 (2011)

  45. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR 2007, Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 7–10, 2007, Online Proceedings, pp. 342–350 (2007)

  46. Marchetti-Bowick, M., Chambers, N.: Learning for microblogs with distant supervision: political forecasting with twitter. In: EACL 2012, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23–27, 2012, pp. 603–612 (2012)

  47. Min, B., Grishman, R., Wan, L., Wang, C., Gondek, D.: Distant supervision for relation extraction with an incomplete knowledge base. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9–14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp. 777–782 (2013)

  48. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2–7 August 2009, Singapore, pp. 1003–1011 (2009)

  49. Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9–12, 2011, pp. 227–236 (2011)

  50. Nath, A., Domingos, P.M.: Efficient belief propagation for utility maximization and repeated inference. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11–15, 2010 (2010)

  51. Nguyen, T.T., Moschitti, A.: End-to-end relation extraction using distant supervision from external semantic repositories. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA—Short Papers, pp. 277–282 (2011)

  52. Nikolic, M., Elseidy, M., Koch, C.: LINVIEW: incremental view maintenance for complex analytical queries. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, pp. 253–264 (2014)

  53. Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)

    Google Scholar 

  54. Niu, F., Zhang, C., Ré, C., Shavlik, J.W.: Elementary: large-scale knowledge-base construction via machine learning and statistical inference. Int. J. Semant. Web Inf. Syst. 8(3), 42–73 (2012)

    Article  Google Scholar 

  55. Peters, S.E., Zhang, C., Livny, M., Ré, C.: A machine reading system for assembling synthetic paleontological databases. PloS One (2014)

  56. Poon, H., Domingos, P.M.: Joint inference in information extraction. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, July 22–26, 2007, Vancouver, British Columbia, Canada, pp. 913–918 (2007)

  57. Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: EACL 2012, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23–27, 2012, pp. 482–491 (2012)

  58. Ravikumar, P.D., Raskutti, G., Wainwright, M.J., Yu, B.: Model selection in gaussian graphical models: High-dimensional consistency of l\({}_{{1}}\)-regularized MLE. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8–11, 2008, pp. 1329–1336 (2008)

  59. Ré, C., Sadeghian, A.A., Shan, Z., Shin, J., Wang, F., Wu, S., Zhang, C.: Feature engineering for knowledge base construction. IEEE Data Eng. Bull. 37(3), 26–40 (2014)

    Google Scholar 

  60. Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20–24, 2010, Proceedings, Part III, pp. 148–163 (2010)

  61. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, Secaucus (2005)

    MATH  Google Scholar 

  62. Sa, C.D., Ratner, A., Ré, C., Shin, J., Wang, F., Wu, S., Zhang, C.: Deepdive: declarative knowledge base construction. SIGMOD Rec. (2015)

  63. Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  64. Shen, W., Doan, A., Naughton, J.F., Ramakrishnan, R.: Declarative information extraction using datalog with embedded extraction predicates. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23–27, 2007, pp. 1033–1044 (2007)

  65. Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20–24, 2009, pp. 631–640 (2009)

  66. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synth. Lect. Data Manag. (2011)

  67. Surdeanu, M., Gupta, S., Bauer, J., McClosky, D., Chang, A.X., Spitkovsky, V.I., Manning, C.D.: Stanford’s distantly-supervised slot-filling system. In: Proceedings of the Fourth Text Analysis Conference, TAC 2011, Gaithersburg, Maryland, USA, November 14–15, 2011 (2011)

  68. Surdeanu, M., McClosky, D., Tibshirani, J., Bauer, J., Chang, A.X., Spitkovsky, V.I., Manning, C.D.: A simple distant supervision approach for the TAC-KBP slot filling task. In: Proceedings of the Third Text Analysis Conference, TAC 2010, Gaithersburg, Maryland, USA, November 15–16, 2010 (2010)

  69. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. II. Computer Science Press, New York (1989)

    Google Scholar 

  70. Wainwright, M.J., Jordan, M.I.: Log-determinant relaxation for approximate inference in discrete markov random fields. IEEE Trans Signal Process 54(6–1), 2099–2109 (2006)

    Article  Google Scholar 

  71. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  72. Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from web sources. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6–11, 2010, Indianapolis, Indiana, USA, pp. 65–76 (2010)

  73. Wick, M.L., McCallum, A.: Query-aware MCMC. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, pp. 2564–2572 (2011)

  74. Wick, M.L., McCallum, A., Miklau, G.: Scalable probabilistic databases with factor graphs and MCMC. PVLDB 3(1), 794–804 (2010)

    Google Scholar 

  75. Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, 9–11 October 2010, MIT Stata Center, Massachusetts, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1013–1023 (2010)

  76. Yates, A., Banko, M., Broadhead, M., Cafarella, M.J., Etzioni, O., Soderland, S.: Textrunner: Open information extraction on the web. In: Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, April 22–27, 2007, Rochester, New York, USA, pp. 25–26 (2007)

  77. Zhang, C., Niu, F., Ré, C., Shavlik, J.W.: Big data versus the crowd: looking for relationships in all the right places. In: The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8–14, 2012, Jeju Island, Korea—Volume 1: Long Papers, pp. 825–834 (2012)

  78. Zhang, C., Ré, C.: Towards high-throughput gibbs sampling at scale: a study across storage managers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, 2013, pp. 397–408 (2013)

  79. Zhang, C., Re, C.: Dimmwitted: a study of main-memory statistical analytics. PVLDB 7(12), 1283–1294 (2014)

    Google Scholar 

  80. Zhang, X., Zhang, J., Zeng, J., Yan, J., Chen, Z., Sui, Z.: Towards accurate distant supervision for relational facts extraction. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4–9 August 2013, Sofia, Bulgaria, Volume 2: Short Papers, pp. 810–815 (2013)

  81. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20–24, 2009, pp. 101–110 (2009)

Download references

Acknowledgments

We gratefully acknowledge the support of the Defense Advanced Research Projects Agency (DARPA) XDATA program under No. FA8750-12-2-0335 and DEFT program under No. FA8750-13-2-0039, DARPA’s MEMEX program and SIMPLEX program, the National Science Foundation (NSF) CAREER Award under No. IIS-1353606, the Office of Naval Research (ONR) under awards No. N000141210041 and No. N000141310129, the National Institutes of Health Grant U54EB020405 awarded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative, the Sloan Research Fellowship, the Moore Foundation, American Family Insurance, Google, and Toshiba. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, AFRL, NSF, ONR, NIH, or the U.S. government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ce Zhang.

Additional information

The original version of this paper is entitled “Incremental Knowledge Base Construction Using DeepDive” and was published in the Proceedings of the VLDB Endowment, 2015. This is an extended version of the original paper. This paper also contains content from the paper “DeepDive: Declarative Knowledge Base Construction” [62] published in the SIGMOD Record 2015.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Sa, C., Ratner, A., Ré, C. et al. Incremental knowledge base construction using DeepDive. The VLDB Journal 26, 81–105 (2017). https://doi.org/10.1007/s00778-016-0437-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0437-2

Keywords

Navigation