Skip to main content

Big Data: Knowledge Discovery and Data Repositories

  • Chapter
  • First Online:
Mental Health Informatics

Part of the book series: Health Informatics ((HI))

Abstract

“Big Data” is a concept that has been used in the last 10–15 years to describe the increasing complexity and amount of data available at scale in organizations and companies—data that often requires novel computational techniques and methods to generate knowledge. Compared to other health domains, mental health is influenced by a greater variety of factors, such as those related to mental, interpersonal, cultural, environmental, and biological phenomena. Thus, knowledge discovery in mental health research can involve a broad variety of data types and therefore data resources, including medical, behavioral, administrative, molecular, ‘omics’, environmental, financial, geographic, and social media repositories. Moreover, these varied phenomena interact in more complex ways in mental health and illness than in other domains of health so knowledge discovery must be open to this complexity. In this chapter, we outline the main underlying concepts of the “big data” paradigm and examine examples of different types of data repositories that could be used for mental health research. We also provide an example case study for developing a data repository, outlining the key considerations for designing, building, and using these types of resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. De Mauro A, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Libr Rev. 2016;65:122–35.

    Article  Google Scholar 

  2. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2:3.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Gruebner O, Sykora M, Lowe SR, Shankardass K, Galea S, Subramanian SV. Big data opportunities for social behavioral and mental health research. Soc Sci Med. 2017;189:167–9.

    Article  PubMed  Google Scholar 

  4. McIntosh AM, Stewart R, John A, Smith DJ, Davis K, Sudlow C, et al. Data science for mental health: a UK perspective on a global challenge. Lancet Psychiatry. 2016;3:993–8.

    Article  PubMed  Google Scholar 

  5. Stewart R, Davis K. ‘big data’ in mental health research: current status and emerging possibilities. Soc Psychiatry Psychiatr Epidemiol. 2016;51:1055–72.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Russ TC, Woelbert E, Davis KAS, Hafferty JD, Ibrahim Z, Inkster B, et al. How data science can advance mental health research. Nat Hum Behav. 2019;3:24–32.

    Article  PubMed  Google Scholar 

  7. Khoury MJ, Ioannidis JPA. Big data meets public health. Science. 2014;346:1054–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Passos IC, Mwangi B, Kapczinski F. Big data analytics and machine learning: 2015 and beyond. Lancet Psychiatry. 2016;3:13–5.

    Article  PubMed  Google Scholar 

  9. Passos IC, Mwangi B, Kapczinski F, editors. Personalized psychiatry: big data analytics in mental health [Internet]. Springer International Publishing, Berlin; 2019 [cited 2019 Sep 24]. Available from: https://www.springer.com/gb/book/9783030035525

  10. Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O, Hedensted S, et al. From big data to precision medicine. Front Med (Lausanne). 2019;6:34.

    Article  Google Scholar 

  11. Furu K, Wettermark B, Andersen M, Martikainen JE, Almarsdottir AB, Sørensen HT. The Nordic countries as a cohort for Pharmacoepidemiological research. Basic Clin Pharmacol Toxicol. 2010;106:86–94.

    Article  CAS  PubMed  Google Scholar 

  12. Mental Health Research Network [Internet]. [cited 2020 May 15]. Available from: http://hcsrn.org/mhrn/en/

  13. OMOP common data model – OHDSI [Internet]. [cited 2020 Aug 12]. Available from: https://www.ohdsi.org/data-standardization/the-common-data-model/

  14. PCORnet [Internet]. The national patient-centered clinical research network. [cited 2020 Aug 12]. Available from: https://pcornet.org/

  15. PCORnet common data model forum [Internet]. GitHub. [cited 2020 Aug 12]. Available from: https://github.com/CDMFORUM

  16. Standards | CDISC [Internet]. [cited 2020 Sep 11]. Available from: https://www.cdisc.org/standards

  17. Hume S, Aerts J, Sarnikar S, Huser V. Current applications and future directions for the CDISC operational data model standard: a methodological review. J Biomed Inform. 2016;60:352–62.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Apache Hadoop [Internet]. [cited 2020 Aug 11]. Available from: https://hadoop.apache.org/

  19. Apache Spark™ – Unified Analytics Engine for Big Data [Internet]. [cited 2020 Aug 11]. Available from: https://spark.apache.org/

  20. Apache Hive TM [Internet]. [cited 2020 Aug 11]. Available from: https://hive.apache.org/

  21. Apache Flink: Stateful Computations over Data Streams [Internet]. [cited 2020 Aug 11]. Available from: https://flink.apache.org/

  22. Apache Kafka [Internet]. Apache Kafka. [cited 2020 Aug 11]. Available from: https://kafka.apache.org/

  23. Martone ME, Garcia-Castro A, VandenBos GR. Data sharing in psychology. Am Psychol. 2018;73:111–25.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Baker M. 1,500 scientists lift the lid on reproducibility. Nature News. 2016;533:452.

    Article  CAS  Google Scholar 

  25. Wilkinson MD, Dumontier M, IJJ A, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3:160018.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Recommended Data Repositories | Scientific Data [Internet]. [cited 2020 May 15]. Available from: https://www.nature.com/sdata/policies/repositories

  27. Hesse BW. Can psychology walk the walk of open science? Am Psychol. 2018;73:126–37.

    Article  PubMed  Google Scholar 

  28. Gremyr A, Malm U, Lundin L, Andersson A-C. A learning health system for people with severe mental illness: a promise for continuous learning, patient coproduction and more effective care. Digital Psychiatry Taylor & Francis. 2019;2:8–13.

    Article  Google Scholar 

  29. UK Biobank [Internet]. [cited 2020 May 15]. Available from: https://www.ukbiobank.ac.uk/

  30. Tenenbaum JD, Bhuvaneshwar K, Gagliardi JP, Fultz Hollis K, Jia P, Ma L, et al. Translational bioinformatics in mental health: open access data sources and computational biomarker discovery. Brief Bioinformatics. 2019;20:842–56.

    Article  PubMed  Google Scholar 

  31. Genetic links to anxiety and depression study – GLAD study [Internet]. [cited 2020 May 15]. Available from: https://gladstudy.org.uk/

  32. Matcham F. Barattieri di san Pietro C, Bulgari V, de Girolamo G, Dobson R, Eriksson H, et al. remote assessment of disease and relapse in major depressive disorder (RADAR-MDD): a multi-Centre prospective cohort study protocol. BMC Psychiatry. 2019;19:72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. What is the PGC? [Internet]. Psychiatric Genomics Consortium. [cited 2020 May 15]. Available from: https://www.med.unc.edu/pgc/

  34. The all of us research program investigators. The “All of Us” Research Program. N Engl J Med. 2019;381:668–76.

    Google Scholar 

  35. pubmeddev. Home – PubMed – NCBI [Internet]. [cited 2020 May 15]. Available from: https://www.ncbi.nlm.nih.gov/pubmed/

  36. PsycInfo – APA Publishing | APA [Internet]. https://www.apa.org. [cited 2020 May 15]. Available from: https://www.apa.org/pubs/databases/psycinfo/index

  37. OMIM – Online Mendelian Inheritance in Man [Internet]. [cited 2020 May 15]. Available from: https://omim.org/

  38. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 2019;47:W199–205.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46:D1074–82.

    Article  CAS  PubMed  Google Scholar 

  40. SIDER side effect resource [Internet]. [cited 2020 May 15]. Available from: http://sideeffects.embl.de/

  41. NIF | Welcome... [Internet]. [cited 2020 May 15]. Available from: https://neuinfo.org/

  42. ETS Educational Testing Service’s TestLink database [Internet]. [cited 2020 May 18]. Available from: https://www.ets.org/test_link/about/

  43. HaPI Database [Internet]. Behavioral Measurement Database Services. [cited 2020 May 18]. Available from: https://www.bmdshapi.com/hapidatabase/

  44. Mental Measurements Yearbook with Tests in Print [Internet]. [cited 2020 May 18]. Available from: https://www.ovid.com/product-details.10631.html

  45. Mental Measurements Yearbook | Buros Center for Testing | Nebraska [Internet]. [cited 2020 May 18]. Available from: https://buros.org/mental-measurements-yearbook

  46. PsycTESTS – APA Publishing [Internet]. https://www.apa.org. [cited 2020 May 18]. Available from: https://www.apa.org/pubs/databases/psyctests/index

  47. MEDLINE®: Description of the Database [Internet]. [cited 2019 Oct 25]. Available from: https://www.nlm.nih.gov/bsd/medline.html

  48. Medical Subject Headings – Home Page [Internet]. [cited 2019 Oct 25]. Available from: https://www.nlm.nih.gov/mesh/meshhome.html

  49. Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res. 2016;25:86–100.

    Article  PubMed  Google Scholar 

  50. Smalheiser NR. Informatics and hypothesis-driven research. EMBO Rep. 2002;3:702.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Gonzalez-Mantilla AJ, Moreno-De-Luca A, Ledbetter DH, Martin CL. A cross-disorder method to identify novel candidate genes for developmental brain disorders. JAMA Psychiat. 2016;73:275–83.

    Article  Google Scholar 

  52. PharmGKB [Internet]. PharmGKB. [cited 2020 May 15]. Available from: https://www.pharmgkb.org/

  53. Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep [Internet]. 2017 [cited 2019 Oct 29];7. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703951/

  54. So H-C, Chau CK-L, Chiu W-T, Ho K-S, Lo C-P, Yim SH-Y, et al. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat Neurosci. 2017;20:1342–9.

    Article  CAS  PubMed  Google Scholar 

  55. Home – SRA – NCBI [Internet]. [cited 2020 May 15]. Available from: https://www.ncbi.nlm.nih.gov/sra

  56. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41:D991–5.

    Article  CAS  PubMed  Google Scholar 

  57. PRIDE – Proteomics Identification Database [Internet]. [cited 2020 May 15]. Available from: https://www.ebi.ac.uk/pride/archive/

  58. Deutsch EW, Csordas A, Sun Z, Jarnuczak A, Perez-Riverol Y, Ternent T, et al. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 2017;45:D1100–6.

    Article  CAS  PubMed  Google Scholar 

  59. Metabolomics Workbench: Home [Internet]. [cited 2020 May 15]. Available from: https://www.metabolomicsworkbench.org/

  60. MetaboLights – Metabolomics experiments and derived information [Internet]. [cited 2020 May 15]. Available from: https://www.ebi.ac.uk/metabolights/

  61. PharmVar [Internet]. [cited 2020 May 15]. Available from: https://www.pharmvar.org/

  62. NDA [Internet]. [cited 2020 May 15]. Available from: https://nda.nih.gov/

  63. Alfaro-Almagro F, Jenkinson M, Bangerter NK, Andersson JLR, Griffanti L, Douaud G, et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK biobank. NeuroImage. 2018;166:400–24.

    Article  PubMed  Google Scholar 

  64. Vidaurre D, Abeysuriya R, Becker R, Quinn AJ, Alfaro-Almagro F, Smith SM, et al. Discovering dynamic brain networks from big data in rest and task. NeuroImage. 2018;180:646–56.

    Article  PubMed  Google Scholar 

  65. Kirov G, Kendall K, Rees E, Escott-Price V, Hewitt J, Thomas R, et al. The Uk biobank: a resource for Cnv analysis. Eur Neuropsychopharmacol. 2017;27:S491.

    Article  Google Scholar 

  66. Hariprakash JM, Vellarikkal SK, Verma A, Ranawat AS, Jayarajan R, Ravi R, et al. SAGE: a comprehensive resource of genetic variants integrating South Asian whole genomes and exomes. Database [Internet]. 2018 [cited 2020 May 15];2018. Available from: https://academic.oup.com/database/article/doi/10.1093/database/bay080/5067958

  67. OmicsDI: Home [Internet]. [cited 2020 May 15]. Available from: https://www.omicsdi.org/database

  68. Connectome – Homepage [Internet]. [cited 2020 May 15]. Available from: https://www.humanconnectome.org/

  69. A free and open platform for sharing MRI, MEG, EEG, iEEG, and ECoG data – OpenNeuro [Internet]. [cited 2020 May 15]. Available from: https://openneuro.org/

  70. Imaging data | UK Biobank [Internet]. [cited 2020 May 15]. Available from: https://www.ukbiobank.ac.uk/imaging-data/

  71. Genetic data | UK Biobank [Internet]. [cited 2020 May 15]. Available from: https://www.ukbiobank.ac.uk/scientists-3/genetic-data/

  72. Dahl A, Cai N, Ko A, Laakso M, Pajukanta P, Flint J, et al. Reverse GWAS: using genetics to identify and model phenotypic subtypes. PLoS Genet. 2019;15:e1008009.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Avec 2018 [Internet]. [cited 2020 May 15]. Available from: https://sites.google.com/view/avec2018

  74. Major Depressive Disorder | RADAR-CNS [Internet]. [cited 2020 May 15]. Available from: https://www.radar-cns.org/about/conditions/major-depressive-disorder

  75. Robinson P, Turk D, Jilka S, Cella M. Measuring attitudes towards mental health using social media: investigating stigma and trivialisation. Soc Psychiatry Psychiatr Epidemiol. 2019;54:51–8.

    Article  PubMed  Google Scholar 

  76. Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological language on twitter Predicts County-level heart disease mortality. Psychol Sci. 2015;26:159–69.

    Article  PubMed  Google Scholar 

  77. Gkotsis G, Oellrich A, Velupillai S, Liakata M, Hubbard TJP, Dobson RJB, et al. Characterisation of mental health conditions in social media using informed deep learning. Sci Rep. 2017;7:45141.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Choudhury MD, Kiciman E. The language of social support in social media and its effect on suicidal ideation risk. Proceedings of the International Conference on Web and Social Media (ICWSM-17) [Internet]. AAAI; 2017. Available from: https://www.microsoft.com/en-us/research/publication/language-social-support-social-media-effect-suicidal-ideation-risk/

  79. Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK biobank participants. Sci Rep. 2018;8:1–10.

    Article  CAS  Google Scholar 

  80. Lyall LM, Wyse CA, Graham N, Ferguson A, Lyall DM, Cullen B, et al. Association of disrupted circadian rhythmicity with mood disorders, subjective wellbeing, and cognitive function: a cross-sectional study of 91 105 participants from the UK biobank. Lancet Psychiatry. 2018;5:507–14.

    Article  PubMed  Google Scholar 

  81. Tasnim M, Stroulia E. Detecting depression from voice. In: Meurs M-J, Rudzicz F, editors. Advances in artificial intelligence. Springer International Publishing, Berlin; 2019. p. 472–478.

    Chapter  Google Scholar 

  82. Gunn JF, Lester D. Using google searches on the internet to monitor suicidal behavior. J Affect Disord. 2013;148:411–2.

    Article  PubMed  Google Scholar 

  83. Royal S. Machine learning: what do the public think?; the Royal Society’s public dialogue on machine learning. London, UK: Royal Society; 2017. p 92. Available from: https://royalsociety.org/~/media/policy/projects/machine-learning/publications/public-views-of-machine-learning-ipsos-mori.pdf

  84. Conway M, O’Connor D. Social media, big data, and mental health: current advances and ethical implications. Curr Opin Psychol. 2016;9:77–82.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Cheng Q, Li TM, Kwok C-L, Zhu T, Yip PS. Assessing suicide risk and emotional distress in Chinese social media: a text mining and machine learning study. J Med Internet Res. 2017;19:e243.

    Article  PubMed  PubMed Central  Google Scholar 

  86. ICES Data [Internet]. [cited 2020 May 15]. Available from: https://www.ices.on.ca/Data-and-Privacy/ICES-data

  87. Data Linkage WA [Internet]. Data Linkage WA. [cited 2020 May 15]. Available from: https://www.datalinkage-wa.org.au/

  88. VigiAccess [Internet]. [cited 2020 May 15]. Available from: http://www.vigiaccess.org/

  89. Data – Clalit Research Institute [Internet]. [cited 2020 May 15]. Available from: http://clalitresearch.org/about-us/our-data/

  90. Longitudinal Health Insurance Database of Taiwan [Internet]. [cited 2020 May 15]. Available from: https://nhird.nhri.org.tw/en/

  91. Research, Statistics, Data & Systems | CMS [Internet]. [cited 2020 May 15]. Available from: https://www.cms.gov/Research-Statistics-Data-and-Systems/Research-Statistics-Data-and-Systems

  92. Welcome to IQVIA – A New Path to Your Success Via Human Data Science [Internet]. [cited 2020 May 15]. Available from: https://www.iqvia.com/

  93. IBM MarketScan Research Databases – Overview [Internet]. 2020 [cited 2020 May 15]. Available from: https://www.ibm.com/products/marketscan-research-databases

  94. Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the power of artificial intelligence with the richness of healthcare claims data: opportunities and challenges. PharmacoEconomics. 2019;37:745–52.

    Article  PubMed  Google Scholar 

  95. Miller M, Swanson SA, Azrael D, Pate V, Stürmer T. Antidepressant dose, age, and the risk of deliberate self-harm. JAMA Intern Med. 2014;174:899–909.

    Article  CAS  PubMed  Google Scholar 

  96. Goldberg PD, Goldberg D, Huxley DP, Huxley P. Mental illness in the community: the pathway to psychiatric care. London: Routledge; 1980.

    Google Scholar 

  97. John A, McGregor J, Fone D, Dunstan F, Cornish R, Lyons RA, et al. Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data. BMC Med Inform Decis Mak. 2016;16:35.

    Article  PubMed  PubMed Central  Google Scholar 

  98. Spiers N, Qassem T, Bebbington P, McManus S, King M, Jenkins R, et al. Prevalence and treatment of common mental disorders in the English national population, 1993–2007. Br J Psychiatry. 2016;209:150–6.

    Article  PubMed  Google Scholar 

  99. Bate A, Lindquist M, Edwards IR. The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database. Fundam Clin Pharmacol. 2008;22:127–40.

    Article  CAS  PubMed  Google Scholar 

  100. Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51:S30–7.

    Article  PubMed  PubMed Central  Google Scholar 

  101. Zeltzer D, Balicer RD, Shir T, Flaks-Manov N, Einav L, Shadmi E. Prediction accuracy with electronic medical records versus administrative claims. Med Care. 2019;57:551–9.

    Article  PubMed  Google Scholar 

  102. Richard M, Aimé X, Krebs M-O, Charlet J. Enrich classifications in psychiatry with textual data: an ontology for psychiatry including social concepts. Stud Health Technol Inform. 2015;210:221–3.

    PubMed  Google Scholar 

  103. Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, et al. Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances. J Biomed Inform. 2018;88:11–9.

    Article  PubMed  PubMed Central  Google Scholar 

  104. Jackson R, Patel R, Velupillai S, Gkotsis G, Hoyle D, Stewart R. Knowledge discovery for Deep Phenotyping serious mental illness from Electronic Mental Health records [version 2; referees: 2 approved with reservations]. F1000Research. 2018;7:210.

    Article  PubMed  PubMed Central  Google Scholar 

  105. Weissman MM, Pathak J, Talati A. Personal life events-a promising dimension for psychiatry in electronic health records. JAMA Psychiatry. 2019;77(2):115–6.

    Article  Google Scholar 

  106. Lyalina S, Percha B, LePendu P, Iyer SV, Altman RB, Shah NH. Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. J Am Med Inform Assoc. 2013;20:e297–305.

    Article  PubMed  PubMed Central  Google Scholar 

  107. Coleman KJ, Stewart C, Waitzfelder BE, Zeber JE, Morales LS, Ahmed AT, et al. Racial/ethnic differences in diagnoses and treatment of mental health conditions across healthcare systems participating in the mental Health Research network. Psychiatr Serv. 2016;67:749–57.

    Article  PubMed  PubMed Central  Google Scholar 

  108. Huang SH, LePendu P, Iyer SV, Tai-Seale M, Carrell D, Shah NH. Toward personalizing treatment for depression: predicting diagnosis and severity. J Am Med Inform Assoc. 2014;21:1069–75.

    Article  PubMed  PubMed Central  Google Scholar 

  109. Eriksson R, Werge T, Jensen LJ, Brunak S. Dose-specific adverse drug reaction identification in electronic patient records: temporal data mining in an inpatient psychiatric population. Drug Saf. 2014;37:237–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol. 2011;7:e1002141.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. i2b2: Informatics for integrating biology & the bedside [Internet]. [cited 2020 May 15]. Available from: https://www.i2b2.org/

  112. SHRINE – Open. Catalyst [Internet]. [cited 2020 Aug 11]. Available from: https://open.catalyst.harvard.edu/products/shrine/

  113. Brown J. Popmednet (Pmn) [Internet]. Zenodo; 2018 [cited 2020 Aug 11]. Available from: https://zenodo.org/record/1400722

  114. Deans KJ, Sabihi S, Forrest CB. Learning health systems. Semin Pediatr Surg. 2018;27:375–8.

    Article  PubMed  Google Scholar 

  115. Horwitz LI, Kuznetsova M, Jones SA. Creating a learning health system through rapid-cycle, Randomized testing. N Engl J Med. 2019;381:1175–9.

    Article  PubMed  Google Scholar 

  116. Network THI. Home | THIN Data [Internet]. [cited 2020 May 15]. Available from: https://www.the-health-improvement-network.com

  117. Clinical Practice Research Datalink | CPRD [Internet]. [cited 2020 May 15]. Available from: https://www.cprd.com/

  118. Welcome to Data QUEST | dataquest.iths.org [Internet]. [cited 2020 May 15]. Available from: https://dataquest.iths.org/

  119. Canadian Primary Care Sentinel Surveillance Network [Internet]. [cited 2020 May 15]. Available from: https://cpcssn.ca/

  120. VA Informatics and Computing Infrastructure (VINCI) [Internet]. [cited 2020 May 15]. Available from: https://www.hsrd.research.va.gov/for_researchers/vinci/

  121. Medical Informatics – Department of Health Sciences Research – Medical Informatics [Internet]. Mayo Clinic. [cited 2020 May 15]. Available from: https://www.mayo.edu/research/departments-divisions/department-health-sciences-research/medical-informatics

  122. A proof of concept for assessing emergency room use with primary care data and natural language processing. Abstract – Europe PMC [Internet]. [cited 2020 May 15]. Available from: https://europepmc.org/article/med/23223678

  123. Braam AW, van Ommeren OWHR, van Buuren ML, Laan W, Smeets HM, Engelhard IM. Local geographical distribution of acute involuntary psychiatric admissions in subdistricts in and around Utrecht, the Netherlands. J Emerg Med Elsevier. 2016;50:449–57.

    Article  Google Scholar 

  124. Clinical Record Interactive Search (CRIS) [Internet]. [cited 2020 May 15]. Available from: https://www.maudsleybrc.nihr.ac.uk/facilities/clinical-record-interactive-search-cris/

  125. Home – Adolescent Mental Health Data Platform [Internet]. [cited 2020 May 15]. Available from: https://www.adolescentmentalhealth.uk/

  126. SAIL Databank – The Secure Anonymised Information Linkage Databank [Internet]. [cited 2020 May 15]. Available from: https://saildatabank.com/

  127. Home | Mental Health Data Science Scotland [Internet]. [cited 2020 May 15]. Available from: https://mhdss.ac.uk/

  128. SHINE – Schools Health and Wellbeing Improvement Research Network [Internet]. [cited 2020 May 15]. Available from: https://shine.sphsu.gla.ac.uk/

  129. Million Veteran Program (MVP) [Internet]. [cited 2020 May 15]. Available from: https://www.research.va.gov/mvp/

  130. Welcome to eMerge > Collaborate [Internet]. [cited 2020 May 15]. Available from: https://emerge-network.org/

  131. Researchers | Register4Share [Internet]. [cited 2020 May 15]. Available from: http://www.registerforshare.org/researchers

  132. EU-AIMS – European Autism Interventions – A Multicentre Study for Deve [Internet]. [cited 2020 May 15]. Available from: https://www.eu-aims.eu/

  133. SFARI | Simons Foundation Autism Research Initiative [Internet]. SFARI. [cited 2020 May 15]. Available from: https://www.sfari.org/

  134. CommonMind Consortium Knowledge Portal – syn2759792 [Internet]. [cited 2020 May 15]. Available from: https://www.synapse.org/#!Synapse:syn2759792/wiki/69613

  135. Home | NRGR [Internet]. [cited 2020 May 15]. Available from: https://www.nimhgenetics.org/

  136. Dentler K, ten Teije A, de Keizer N, Cornet R. Barriers to the reuse of routinely recorded clinical data: a field report. Stud Health Technol Inform. 2013;192:313–7.

    PubMed  Google Scholar 

  137. Huser V, Cimino JJ. Desiderata for healthcare integrated data repositories based on architectural comparison of three public repositories. AMIA Annu Symp Proc. 2013;2013:648–56.

    PubMed  PubMed Central  Google Scholar 

  138. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014;311:2479–80.

    CAS  PubMed  Google Scholar 

  139. Tung JY, Do CB, Hinds DA, Kiefer AK, Macpherson JM, Chowdry AB, et al. Efficient replication of over 180 genetic associations with self-reported medical data. PLoS One. 2011;6:e23473.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Hayes JF, Marston L, Walters K, Geddes JR, King M, Osborn DPJ. Lithium vs. valproate vs. olanzapine vs. quetiapine as maintenance monotherapy for bipolar disorder: a population-based UK cohort study using electronic health records. World Psychiatry. 2016;15:53–8.

    Article  PubMed  PubMed Central  Google Scholar 

  141. Ouchi K, Lindvall C, Chai PR, Boyer EW. Machine learning to predict, detect, and intervene older adults vulnerable for adverse drug events in the emergency department. J Med Toxicol. 2018;14:248–52.

    Article  PubMed  PubMed Central  Google Scholar 

  142. Lee Y, Ragguett R-M, Mansur RB, Boutilier JJ, Rosenblat JD, Trevizol A, et al. Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and systematic review. J Affect Disord. 2018;241:519–32.

    Article  PubMed  Google Scholar 

  143. Kessler RC, Warner CH, Ivany C, Petukhova MV, Rose S, Bromet EJ, et al. Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army study to assess risk and resilience in Servicemembers (Army STARRS). JAMA Psychiat. 2015;72:49–57.

    Article  Google Scholar 

  144. Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A. Stargate GTM: bridging descriptor and activity spaces. J Chem Inf Model. 2015;55:2403–10.

    Article  CAS  PubMed  Google Scholar 

  145. Downs JM, Ford T, Stewart R, Epstein S, Shetty H, Little R, et al. An approach to linking education, social care and electronic health records for children and young people in South London: a linkage study of child and adolescent mental health service data. BMJ Open [Internet]. 2019 [cited 2019 Oct 31];9:e024355. Available from: https://bmjopen.bmj.com/content/9/1/e024355

  146. Iniesta R, Stahl D, McGuffin P. Machine learning, statistical learning and the future of biological research in psychiatry. Psychol Med. 2016;46:2455–65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Brailean A, Curtis J, Davis K, Dregan A, Hotopf M. Characteristics, comorbidities, and correlates of atypical depression: evidence from the UK biobank mental health survey. Psychol Med. 2019:1–10.

    Google Scholar 

  148. Zhou Y, Zhao L, Zhou N, Zhao Y, Marino S, Wang T, et al. Predictive big data analytics using the UK biobank data. Sci Rep [Internet]. 2019 [cited 2019 Oct 21];9:6012. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6461626/

  149. National Institutes of Health (NIH). All of us [Internet]. [cited 2020 May 15]. Available from: https://allofus.nih.gov/

  150. Hofmann-Apitius M, Alarcón-Riquelme ME, Chamberlain C, McHale D. Towards the taxonomy of human disease. Nat Rev Drug Discov. 2015;14:75–6.

    Article  CAS  PubMed  Google Scholar 

  151. Thompson PM, Stein JL, Medland SE, Hibar DP, Vasquez AA, Renteria ME, et al. The ENIGMA consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav. 2014;8:153–82.

    Article  PubMed  PubMed Central  Google Scholar 

  152. Shaw RJ, Cullen B, Graham N, Lyall DM, Mackay D, Okolie C, et al. Living alone, loneliness and lack of emotional support as predictors of suicide and self-harm: seven-year follow up of the UK Biobank cohort. medRxiv. 2019;19008458.

    Google Scholar 

  153. Kyaga S, Landén M, Boman M, Hultman CM, Långström N, Lichtenstein P. Mental illness, suicide and creativity: 40-year prospective total population study. J Psychiatr Res. 2013;47:83–90.

    Article  PubMed  Google Scholar 

  154. Kohane IS. An autism case history to review the systematic analysis of large-scale data to refine the diagnosis and treatment of neuropsychiatric disorders. Biol Psychiatry. 2015;77:59–65.

    Article  PubMed  Google Scholar 

  155. McCoy TH, Castro VM, Hart KL, Pellegrini AM, Yu S, Cai T, et al. Genome-wide association study of dimensional psychopathology using electronic health records. Biol Psychiatry. 2018;83:1005–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am J Epidemiol. 2017;186:1026–34.

    Article  PubMed  PubMed Central  Google Scholar 

  157. Davis KAS, Cullen B, Adams M, Brailean A, Breen G, Coleman JRI, et al. Indicators of mental disorders in UK biobank—a comparison of approaches. Int J Methods Psychiatr Res. 2019;28:e1796.

    Article  PubMed  PubMed Central  Google Scholar 

  158. Larvin H, Peckham E, Prady SL. Case-finding for common mental disorders in primary care using routinely collected data: a systematic review. Soc Psychiatry Psychiatr Epidemiol. 2019;54:1161–75.

    Article  PubMed  Google Scholar 

  159. Davis KAS, Sudlow CLM, Hotopf M. Can mental health diagnoses in administrative data be used for research? A systematic review of the accuracy of routinely collected diagnoses. BMC Psychiatry. 2016;16:263.

    Article  PubMed  PubMed Central  Google Scholar 

  160. Davis K, Bashford O, Jewell A, Shetty H, Stewart R, Sudlow C, et al. The validity of selected mental health diagnoses in English hospital episode statistics using data linkage to clinical records interactive search at South London and Maudsley. 2019.

    Google Scholar 

  161. Davis KAS, Bashford O, Jewell A, Shetty H, Stewart RJ, Sudlow CLM, et al. Using data linkage to electronic patient records to assess the validity of selected mental health diagnoses in English hospital episode statistics (HES). PLoS One. 2018;13:e0195002.

    Article  PubMed  PubMed Central  Google Scholar 

  162. Cai N, Revez JA, Adams MJ, Andlauer TFM, Breen G, Byrne EM, et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat Genet Nature Publishing Group. 2020;52:437–47.

    Article  CAS  Google Scholar 

  163. Summerfield D. How scientifically valid is the knowledge base of global mental health? BMJ. 2008;336:992–4.

    Article  PubMed  PubMed Central  Google Scholar 

  164. Kohrt BA, Rasmussen A, Kaiser BN, Haroz EE, Maharjan SM, Mutamba BB, et al. Cultural concepts of distress and psychiatric disorders: literature review and research recommendations for global mental health epidemiology. Int J Epidemiol. 2014;43:365–406.

    Article  PubMed  Google Scholar 

  165. Sign Up to Help: Patient Registries | Anxiety and Depression Association of America, ADAA [Internet]. [cited 2020 May 15]. Available from: https://adaa.org/sign-help-patient-registries

  166. Ahuja S, Mirzoev T, Lund C, Ofori-Atta A, Skeen S, Kufuor A. Key influences in the design and implementation of mental health information systems in Ghana and South Africa. Global Mental Health [Internet]. 2016 [cited 2019 Oct 31];3:e11. Available from: https://www.cambridge.org/core/journals/global-mental-health/article/key-influences-in-the-design-and-implementation-of-mental-health-information-systems-in-ghana-and-south-africa/DD11E388FB2FFE1E2E7C9D9DF2885E99

  167. Buehler B, Ruggiero R, Mehta K. Empowering community health workers with technology solutions. IEEE Technol Soc Mag. 2013;32:44–52.

    Article  Google Scholar 

  168. McIntyre D, Muirhead D, Gilson L. Geographic patterns of deprivation in South Africa: informing health equity analyses and public resource allocation strategies. Health Policy Plan. 2002;17:30–9.

    Article  PubMed  Google Scholar 

  169. Nugent R, Bertram MY, Jan S, Niessen LW, Sassi F, Jamison DT, et al. Investing in non-communicable disease prevention and management to advance the sustainable development goals. Lancet. 2018;391:2029–35.

    Article  PubMed  Google Scholar 

  170. Semrau M, Evans-Lacko S, Alem A, Ayuso-Mateos JL, Chisholm D, Gureje O, et al. Strengthening mental health systems in low- and middle-income countries: the emerald programme. BMC Med. 2015;13:79.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumithra Velupillai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Velupillai, S., Davis, K.A.S., Rozenblit, L. (2021). Big Data: Knowledge Discovery and Data Repositories. In: Tenenbaum, J.D., Ranallo, P.A. (eds) Mental Health Informatics. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-70558-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-70558-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-70557-2

  • Online ISBN: 978-3-030-70558-9

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics