Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings

de Haan, Rosaline; Tiddi, Ilaria; Beek, Wouter

doi:10.1007/978-3-030-77385-4_28

Rosaline de Haan¹⁶,
Ilaria Tiddi¹⁷ &
Wouter Beek¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12731))

Included in the following conference series:

European Semantic Web Conference

2664 Accesses
5 Citations

Abstract

In an era of ever-increasing scientific publications available, scientists struggle to keep pace with the literature, interpret research results and identify new research hypotheses to falsify. This is particularly in fields such as the social sciences, where automated support for scientific discovery is still widely unavailable and unimplemented. In this work, we introduce an automated system that supports social scientists in identifying new research hypotheses. With the idea that knowledge graphs help modeling domain-specific information, and that machine learning can be used to identify the most relevant facts therein, we frame the problem of hypothesis discovery as a link prediction task, where the ComplEx model is used to predict new relationships between entities of a knowledge graph representing scientific papers and their experimental details. The final output consists in fully formulated hypotheses including the newly discovered triples (hypothesis statement), along with supporting statements from the knowledge graph (hypothesis evidence and hypothesis history). A quantitative and qualitative evaluation is carried using experts in the field. Encouraging results show that a simple combination of machine learning and knowledge graph methods can serve as a basis for automated scientific discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://geneontology.org/.
2.
http://www.obofoundry.org/.
3.
https://linkeddata.cochrane.org/pico-ontology.
4.
https://www.orkg.org/orkg/.
5.
http://data.cooperationdatabank.org/.
6.
CODA contains two types of effect size measures, i.e. the correlation coefficient \(\rho \) and the standardized mean difference d, which can be easily converted to one another. For simplicity, we will only refer to Cohen’s d values from now on.
7.
https://data.cooperationdatabank.org/coda/-/queries/link-prediction-selection-query.
8.
Due to the relatively small sets, medium and large effects were grouped together.
9.
https://data.cooperationdatabank.org/coda/-/queries/Rosaline-Construct-Link-Prediction.
10.
https://github.com/Accenture/AmpliGraph.
11.
https://github.com/roosyay/CoDa_Hypotheses.
12.
https://coda.triply.cc/.

References

Bahler, D., Stone, B., Wellington, C., Bristol, D.W.: Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds. J. Chem. Inf. Comput. Sci. 40(4), 906–914 (2000). https://doi.org/10.1021/ci990116i
Article Google Scholar
Bianchi, F., Rossiello, G., Costabello, L., Palmonari, M., Minervini, P.: Knowledge graph embeddings and explainable AI (April 2020). https://doi.org/10.3233/SSW200011
Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 1–3 (2018). https://doi.org/10.1145/3185515
Article Google Scholar
Clark, T., Ciccarese, P.N., Goble, C.A.: Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J. Biomed. Semant. 5(1), 1–33 (2014). https://doi.org/10.1186/2041-1480-5-28
Article Google Scholar
Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 127–143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_9
Chapter Google Scholar
Garijo, D., et al.: Towards automated hypothesis testing in neuroscience. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 249–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_18
Chapter Google Scholar
Garijo, D., Gil, Y., Ratnakar, V.: The DISK hypothesis ontology: capturing hypothesis evolution for automated discovery. CEUR Workshop Proc. 2065, 40–46 (2017)
Google Scholar
Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). https://doi.org/10.3233/ISU-2010-0613
Article Google Scholar
Huang, S., Wan, X.: AKMiner: domain-specific knowledge graph mining from academic literatures. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 241–255. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_18
Chapter Google Scholar
Katukuri, J.R., Xie, Y., Raghavan, V.V., Gupta, A.: Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genomics 13(Suppl 3), 12–15 (2012). https://doi.org/10.1186/1471-2164-13-s3-s5
Article Google Scholar
Nagarajan, M., et al.: Predicting future scientific discoveries based on a networked analysis of the past literature. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2019–2028 (2015)
Google Scholar
Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting gene-disease associations. Bioinf. 30(12), 60–68 (2014). https://doi.org/10.1093/bioinformatics/btu269
Article Google Scholar
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs (2016). https://doi.org/10.1109/JPROC.2015.2483592
Article Google Scholar
Nye, B., et al.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: ACL 2018, vol. 1, pp. 197–207 (2018). https://doi.org/10.18653/v1/p18-1019
Pankratius, V., et al.: Computer-aided discovery: toward scientific insight generation with machine support why scientists need machine support for discovery search. IEEE Intell. Syst. 31(4), 3–10 (2016). https://doi.org/10.1109/MIS.2016.60
Article Google Scholar
Sang, S., et al.: GrEDeL: a knowledge graph embedding based method for drug discovery from biomedical literatures. IEEE Access 7(2016), 8404–8415 (2019). https://doi.org/10.1109/ACCESS.2018.2886311
Article Google Scholar
Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 2015(12), 1-e37 (2015). https://doi.org/10.7717/peerj-cs.37
Article Google Scholar
Sawilowsky, S.S.: New Effect Size Rules of Thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009). https://doi.org/10.22237/jmasm/1257035100
Article MathSciNet Google Scholar
Sosa, D.N., Derry, A., Guo, M., Wei, E., Brinton, C., Altman, R.B.: A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases. Pacific Symposium on Biocomputing 25, 463–474 (2020). https://doi.org/10.1142/9789811215636_0041
Article Google Scholar
Srinivasan, P.: Text mining: generating hypotheses from MEDLINE. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004). https://doi.org/10.1002/asi.10389
Article Google Scholar
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 91(2), 183–203 (1997). https://doi.org/10.1016/S0004-3702(97)00008-8
Article MATH Google Scholar
Tiddi, I., Balliet, D., ten Teije, A.: Fostering scientific meta-analyses with knowledge graphs: a case-study. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 287–303. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_17
Chapter Google Scholar
Trouillon, T., Welbl, J., Riedel, S., Ciaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 5, pp. 3021–3032 (2016)
Google Scholar
Wallace, B.C., Kuiper, J., Sharma, A., Zhu, M., Marshall, I.J.: Extracting PICO sentences from clinical trial reports using supervised distant supervision (2016)
Google Scholar
Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Triply, Amsterdam, The Netherlands
Rosaline de Haan & Wouter Beek
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Ilaria Tiddi

Authors

Rosaline de Haan
View author publications
You can also search for this author in PubMed Google Scholar
Ilaria Tiddi
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Beek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilaria Tiddi .

Editor information

Editors and Affiliations

Ghent University, Ghent, Belgium
Ruben Verborgh
Aalborg University, Aalborg, Denmark
Katja Hose
University of Mannheim, Mannheim, Germany
Heiko Paulheim
ERCIM, Sophia Antipolis, France
Pierre-Antoine Champin
University of Siegen, Siegen, Germany
Maria Maleshkova
Universidad Politécnica de Madrid, Boadilla del Monte, Spain
Oscar Corcho
eBay Inc., San Jose, CA, USA
Petar Ristoski
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Eggenstein-Leopoldshafen, Germany
Mehwish Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Haan, R., Tiddi, I., Beek, W. (2021). Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-77385-4_28
Published: 31 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77384-7
Online ISBN: 978-3-030-77385-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics