Skip to main content

Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12731))

Included in the following conference series:

Abstract

In an era of ever-increasing scientific publications available, scientists struggle to keep pace with the literature, interpret research results and identify new research hypotheses to falsify. This is particularly in fields such as the social sciences, where automated support for scientific discovery is still widely unavailable and unimplemented. In this work, we introduce an automated system that supports social scientists in identifying new research hypotheses. With the idea that knowledge graphs help modeling domain-specific information, and that machine learning can be used to identify the most relevant facts therein, we frame the problem of hypothesis discovery as a link prediction task, where the ComplEx model is used to predict new relationships between entities of a knowledge graph representing scientific papers and their experimental details. The final output consists in fully formulated hypotheses including the newly discovered triples (hypothesis statement), along with supporting statements from the knowledge graph (hypothesis evidence and hypothesis history). A quantitative and qualitative evaluation is carried using experts in the field. Encouraging results show that a simple combination of machine learning and knowledge graph methods can serve as a basis for automated scientific discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://geneontology.org/.

  2. 2.

    http://www.obofoundry.org/.

  3. 3.

    https://linkeddata.cochrane.org/pico-ontology.

  4. 4.

    https://www.orkg.org/orkg/.

  5. 5.

    http://data.cooperationdatabank.org/.

  6. 6.

    CODA contains two types of effect size measures, i.e. the correlation coefficient \(\rho \) and the standardized mean difference d, which can be easily converted to one another. For simplicity, we will only refer to Cohen’s d values from now on.

  7. 7.

    https://data.cooperationdatabank.org/coda/-/queries/link-prediction-selection-query.

  8. 8.

    Due to the relatively small sets, medium and large effects were grouped together.

  9. 9.

    https://data.cooperationdatabank.org/coda/-/queries/Rosaline-Construct-Link-Prediction.

  10. 10.

    https://github.com/Accenture/AmpliGraph.

  11. 11.

    https://github.com/roosyay/CoDa_Hypotheses.

  12. 12.

    https://coda.triply.cc/.

References

  1. Bahler, D., Stone, B., Wellington, C., Bristol, D.W.: Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds. J. Chem. Inf. Comput. Sci. 40(4), 906–914 (2000). https://doi.org/10.1021/ci990116i

    Article  Google Scholar 

  2. Bianchi, F., Rossiello, G., Costabello, L., Palmonari, M., Minervini, P.: Knowledge graph embeddings and explainable AI (April 2020). https://doi.org/10.3233/SSW200011

  3. Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 1–3 (2018). https://doi.org/10.1145/3185515

    Article  Google Scholar 

  4. Clark, T., Ciccarese, P.N., Goble, C.A.: Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J. Biomed. Semant. 5(1), 1–33 (2014). https://doi.org/10.1186/2041-1480-5-28

    Article  Google Scholar 

  5. Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 127–143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_9

    Chapter  Google Scholar 

  6. Garijo, D., et al.: Towards automated hypothesis testing in neuroscience. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 249–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_18

    Chapter  Google Scholar 

  7. Garijo, D., Gil, Y., Ratnakar, V.: The DISK hypothesis ontology: capturing hypothesis evolution for automated discovery. CEUR Workshop Proc. 2065, 40–46 (2017)

    Google Scholar 

  8. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). https://doi.org/10.3233/ISU-2010-0613

    Article  Google Scholar 

  9. Huang, S., Wan, X.: AKMiner: domain-specific knowledge graph mining from academic literatures. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 241–255. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_18

    Chapter  Google Scholar 

  10. Katukuri, J.R., Xie, Y., Raghavan, V.V., Gupta, A.: Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genomics 13(Suppl 3), 12–15 (2012). https://doi.org/10.1186/1471-2164-13-s3-s5

    Article  Google Scholar 

  11. Nagarajan, M., et al.: Predicting future scientific discoveries based on a networked analysis of the past literature. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2019–2028 (2015)

    Google Scholar 

  12. Natarajan, N., Dhillon, I.S.: Inductive matrix completion for predicting gene-disease associations. Bioinf. 30(12), 60–68 (2014). https://doi.org/10.1093/bioinformatics/btu269

    Article  Google Scholar 

  13. Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs (2016). https://doi.org/10.1109/JPROC.2015.2483592

    Article  Google Scholar 

  14. Nye, B., et al.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: ACL 2018, vol. 1, pp. 197–207 (2018). https://doi.org/10.18653/v1/p18-1019

  15. Pankratius, V., et al.: Computer-aided discovery: toward scientific insight generation with machine support why scientists need machine support for discovery search. IEEE Intell. Syst. 31(4), 3–10 (2016). https://doi.org/10.1109/MIS.2016.60

    Article  Google Scholar 

  16. Sang, S., et al.: GrEDeL: a knowledge graph embedding based method for drug discovery from biomedical literatures. IEEE Access 7(2016), 8404–8415 (2019). https://doi.org/10.1109/ACCESS.2018.2886311

    Article  Google Scholar 

  17. Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 2015(12), 1-e37 (2015). https://doi.org/10.7717/peerj-cs.37

    Article  Google Scholar 

  18. Sawilowsky, S.S.: New Effect Size Rules of Thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009). https://doi.org/10.22237/jmasm/1257035100

    Article  MathSciNet  Google Scholar 

  19. Sosa, D.N., Derry, A., Guo, M., Wei, E., Brinton, C., Altman, R.B.: A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases. Pacific Symposium on Biocomputing 25, 463–474 (2020). https://doi.org/10.1142/9789811215636_0041

    Article  Google Scholar 

  20. Srinivasan, P.: Text mining: generating hypotheses from MEDLINE. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004). https://doi.org/10.1002/asi.10389

    Article  Google Scholar 

  21. Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 91(2), 183–203 (1997). https://doi.org/10.1016/S0004-3702(97)00008-8

    Article  MATH  Google Scholar 

  22. Tiddi, I., Balliet, D., ten Teije, A.: Fostering scientific meta-analyses with knowledge graphs: a case-study. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 287–303. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_17

    Chapter  Google Scholar 

  23. Trouillon, T., Welbl, J., Riedel, S., Ciaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 5, pp. 3021–3032 (2016)

    Google Scholar 

  24. Wallace, B.C., Kuiper, J., Sharma, A., Zhu, M., Marshall, I.J.: Extracting PICO sentences from clinical trial reports using supervised distant supervision (2016)

    Google Scholar 

  25. Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilaria Tiddi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Haan, R., Tiddi, I., Beek, W. (2021). Discovering Research Hypotheses in Social Science Using Knowledge Graph Embeddings. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77385-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77384-7

  • Online ISBN: 978-3-030-77385-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics