Abstract
In an era of ever-increasing scientific publications available, scientists struggle to keep pace with the literature, interpret research results and identify new research hypotheses to falsify. This is particularly in fields such as the social sciences, where automated support for scientific discovery is still widely unavailable and unimplemented. In this work, we introduce an automated system that supports social scientists in identifying new research hypotheses. With the idea that knowledge graphs help modeling domain-specific information, and that machine learning can be used to identify the most relevant facts therein, we frame the problem of hypothesis discovery as a link prediction task, where the ComplEx model is used to predict new relationships between entities of a knowledge graph representing scientific papers and their experimental details. The final output consists in fully formulated hypotheses including the newly discovered triples (hypothesis statement), along with supporting statements from the knowledge graph (hypothesis evidence and hypothesis history). A quantitative and qualitative evaluation is carried using experts in the field. Encouraging results show that a simple combination of machine learning and knowledge graph methods can serve as a basis for automated scientific discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
CODA contains two types of effect size measures, i.e. the correlation coefficient \(\rho \) and the standardized mean difference d, which can be easily converted to one another. For simplicity, we will only refer to Cohen’s d values from now on.
- 7.
- 8.
Due to the relatively small sets, medium and large effects were grouped together.
- 9.
- 10.
- 11.
- 12.
References
Bahler, D., Stone, B., Wellington, C., Bristol, D.W.: Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds. J. Chem. Inf. Comput. Sci. 40(4), 906–914 (2000). https://doi.org/10.1021/ci990116i
Bianchi, F., Rossiello, G., Costabello, L., Palmonari, M., Minervini, P.: Knowledge graph embeddings and explainable AI (April 2020). https://doi.org/10.3233/SSW200011
Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 1–3 (2018). https://doi.org/10.1145/3185515
Clark, T., Ciccarese, P.N., Goble, C.A.: Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J. Biomed. Semant. 5(1), 1–33 (2014). https://doi.org/10.1186/2041-1480-5-28
Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 127–143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_9
Garijo, D., et al.: Towards automated hypothesis testing in neuroscience. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 249–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_18
Garijo, D., Gil, Y., Ratnakar, V.: The DISK hypothesis ontology: capturing hypothesis evolution for automated discovery. CEUR Workshop Proc. 2065, 40–46 (2017)
Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). https://doi.org/10.3233/ISU-2010-0613
Huang, S., Wan, X.: AKMiner: domain-specific knowledge graph mining from academic literatures. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 241–255. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_18
Katukuri, J.R., Xie, Y., Raghavan, V.V., Gupta, A.: Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genomics 13(Suppl 3), 12–15 (2012). https://doi.org/10.1186/1471-2164-13-s3-s5
Nagarajan, M., et al.: Predicting future scientific discoveries based on a networked analysis of the past literature. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2019–2028 (2015)
Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting gene-disease associations. Bioinf. 30(12), 60–68 (2014). https://doi.org/10.1093/bioinformatics/btu269
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs (2016). https://doi.org/10.1109/JPROC.2015.2483592
Nye, B., et al.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: ACL 2018, vol. 1, pp. 197–207 (2018). https://doi.org/10.18653/v1/p18-1019
Pankratius, V., et al.: Computer-aided discovery: toward scientific insight generation with machine support why scientists need machine support for discovery search. IEEE Intell. Syst. 31(4), 3–10 (2016). https://doi.org/10.1109/MIS.2016.60
Sang, S., et al.: GrEDeL: a knowledge graph embedding based method for drug discovery from biomedical literatures. IEEE Access 7(2016), 8404–8415 (2019). https://doi.org/10.1109/ACCESS.2018.2886311
Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 2015(12), 1-e37 (2015). https://doi.org/10.7717/peerj-cs.37
Sawilowsky, S.S.: New Effect Size Rules of Thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009). https://doi.org/10.22237/jmasm/1257035100
Sosa, D.N., Derry, A., Guo, M., Wei, E., Brinton, C., Altman, R.B.: A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases. Pacific Symposium on Biocomputing 25, 463–474 (2020). https://doi.org/10.1142/9789811215636_0041
Srinivasan, P.: Text mining: generating hypotheses from MEDLINE. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004). https://doi.org/10.1002/asi.10389
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 91(2), 183–203 (1997). https://doi.org/10.1016/S0004-3702(97)00008-8
Tiddi, I., Balliet, D., ten Teije, A.: Fostering scientific meta-analyses with knowledge graphs: a case-study. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 287–303. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_17
Trouillon, T., Welbl, J., Riedel, S., Ciaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 5, pp. 3021–3032 (2016)
Wallace, B.C., Kuiper, J., Sharma, A., Zhu, M., Marshall, I.J.: Extracting PICO sentences from clinical trial reports using supervised distant supervision (2016)
Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Haan, R., Tiddi, I., Beek, W. (2021). Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-77385-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77384-7
Online ISBN: 978-3-030-77385-4
eBook Packages: Computer ScienceComputer Science (R0)