Predicting Invariant Nodes in Large Scale Semantic Knowledge Graphs

  • Damian Barsotti
  • Martin Ariel Dominguez
  • Pablo Ariel DuboueEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 795)


Understanding and predicting how large scale knowledge graphs change over time has direct implications in software and hardware associated with their maintenance and storage. An important subproblem is predicting invariant nodes, that is, nodes within the graph will not have any edges deleted or changed (add-only nodes) or will not have any edges added or changed (del-only nodes). Predicting add-only nodes correctly has practical importance, as such nodes can then be cached or represented using a more efficient data structure. This paper presents a logistic regression approach using attribute-values as features that achieves 90%+ precision on DBpedia yearly changes trained using Apache Spark. The paper concludes by outlining how we plan to use these models for evaluating Natural Language Generation algorithms.



The authors would like to thank the Secretaria de Ciencia y Tecnica of Cordoba Province for support and Annie Ying and the anonymous reviewers for helpful comments and suggestions. They would also like to extend their gratitude to the organizers of the SIMBig Symposium.


  1. 1.
    Antoniou, G., Van Harmelen, F.: Web ontology language: OWL. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 67–92. Springer, Heidelberg (2004). Scholar
  2. 2.
    Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)Google Scholar
  3. 3.
    Botelho, F.C., Ziviani, N.: External perfect hashing for very large key sets. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 653–662. ACM, New York (2007).
  4. 4.
    Cheng, S., Termehchy, A., Hristidis, V.: Efficient prediction of difficult keyword queries over databases. IEEE Trans. Knowl. Data Eng. 26(6), 1507–1520 (2014)CrossRefGoogle Scholar
  5. 5.
    Drury, B., Valverde-Rebaza, J.C., de Andrade Lopes, A.: Causation generalization through the identification of equivalent nodes in causal sparse graphs constructed from text using node similarity strategies. In: Proceedings of SIMBig, Peru (2015)Google Scholar
  6. 6.
    Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of the 2003 Conference on Empirical Methods for Natural Language Processing, EMNLP 2003, Sapporo, Japan, July 2003Google Scholar
  7. 7.
    Duboue, P.A., Domínguez, M.A.: Using robustness to learn to order semantic properties in referring expression generation. In: Montes-y-Gómez, M., Escalante, H.J., Segura, A., Murillo, J.D. (eds.) IBERAMIA 2016. LNCS (LNAI), vol. 10022, pp. 163–174. Springer, Cham (2016). Scholar
  8. 8.
    Duboue, P.A., Domínguez, M.A., Estrella, P.: On the robustness of standalone referring expression generation algorithms using RDF data. In: WebNLG 2016, p. 17 (2016)Google Scholar
  9. 9.
    Eder, J., Koncilia, C.: Modelling changes in ontologies. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM 2004. LNCS, vol. 3292, pp. 662–673. Springer, Heidelberg (2004). Scholar
  10. 10.
    Kauppinen, T., Hyvnen, E.: Modeling and reasoning about changes in ontology time series. In: Sharman, R., Kishore, R., Ramesh, R. (eds.) Ontologies. Integrated Series in Information Systems, pp. 319–338. Springer, Boston (2007). Scholar
  11. 11.
    Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)zbMATHGoogle Scholar
  12. 12.
    Lassila, O., Swick, R.R., Wide, W., Consortium, W.: Resource description framework (RDF) model and syntax specification (1998)Google Scholar
  13. 13.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. 6(2), 167–195 (2015)Google Scholar
  14. 14.
    Li, X., Zhou, W.: Performance comparison of Hive, Impala and Spark SQL. In: 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 1, pp. 418–423. IEEE (2015)Google Scholar
  15. 15.
    Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Owen, S.: Mahout in Action. Manning, Shelter Island (2012)Google Scholar
  17. 17.
    Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  18. 18.
    Rula, A., Panziera, L., Palmonari, M., Maurino, A.: Capturing the currency of DBpedia descriptions and get insight into their validity. In: Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 20 October 2014 (2014)Google Scholar
  19. 19.
    Stefanović, D., McKinley, K.S., Moss, J.E.B.: Age-based garbage collection. ACM SIGPLAN Not. 34(10), 370–381 (1999)CrossRefGoogle Scholar
  20. 20.
    Tsai, C.F., Hsu, Y.F., Lin, C.Y., Lin, W.Y.: Intrusion detection by machine learning: a review. Expert Syst. Appl. 36(10), 11994–12000 (2009)CrossRefGoogle Scholar
  21. 21.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  22. 22.
    Xie, Q., Ma, X., Dai, Z., Hovy, E.: An interpretable knowledge transfer model for knowledge base completion. In: ACL 2017: Proceedings of the Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Vancouver (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Damian Barsotti
    • 1
  • Martin Ariel Dominguez
    • 1
  • Pablo Ariel Duboue
    • 1
    Email author
  1. 1.FaMAF-UNCCordobaArgentina

Personalised recommendations