An automated framework for hypotheses generation using literature
In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds “crisp semantic associations” among entities of interest - that is a step towards bridging such gaps.
The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect “crisp” associations, and making assertions about entities (such as disease X is associated with a set of factors Z).
Pilot studies were performed using two diseases. A comparative analysis of the computed “associations” and “assertions” with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture “crisp” direct and indirect associations, and provide knowledge discovery on demand.
The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
- Gao Y, Kinoshita J, Wu E, Miller E, Lee R, Seaborne A, Cayzer S, Clark T: SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research. Journal of Web Semantics 2006,4(3):222–228. CrossRef
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL: The human disease network. Proc Natl Acad Sci USA 2007,104(21):8685–8690. CrossRef
- Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001,292(5518):929–934. CrossRef
- Zhang X, Zhang R, Jiang Y, Sun P, Tang G, Wang X, Lv H, Li X: The expanded human disease network combining protein–protein interaction information. Eur J Hum Genet 2011,19(7):783–788. CrossRef
- Rzhetsky A, Seringhaus M, Gerstein M: Seeking a new biology through text mining. Cell 2008,134(1):9–13. CrossRef
- Hirschman L, Morgan AA, Yeh AS: Rutabaga by any other name: extracting biological names. J Biomed Inform 2002,35(4):247–259. CrossRef
- Wilbur WJ, Hazard GF, Divita G, Mork JG, Aronson AR, Browne AC: Analysis of biomedical text for chemical names: a comparison of three methods. Proc AMIA Symp 1999, :176–180. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2232672/
- Landauer TK, Dumais ST: A solution to plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 1997, 104:211–240. CrossRef
- Lee DD, Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401:788–791. CrossRef
- Paatero P, Tapper U: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 1994, 5:111–126. CrossRef
- Berry MW, Browne M: Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia, USA: SIAM; 1990.
- Swanson D, Smalheiser N: Assessing a gap in the biomedical literature: magnesium deficiency and neurologic disease. Neurosci Res Commun 1994, 15:1–9.
- Srinivasan P, Libbus B: Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics 2004,20(Suppl 1):i290-i296. CrossRef
- Yeasin M, Malempati H, Homayouni R, Sorower MS: A systematic study on latent semantic analysis model parameters for mining biomedical literature. Conference Proceedings: BMC Bioinformatics 2009,10(Suppl. 7):A6. CrossRef
- Medlink Neurology. [http://www.medlink.com/medlinkcontent.asp]
- Catling LA, Abubakar I, Lake IR, Swift L, Hunter PR: A systematic review of analytical observational studies investigating the association between cardiovascular disease and drinking water hardness. J Water Health 2008,6(4):433–442. CrossRef
- Menown IA, Shand JA: Recent advances in cardiology. Future Cardiol 2010,6(1):11–17. CrossRef
- Tafet GE, Idoyaga-Vargas VP, Abulafia DP, Calandria JM, Roffman SS, Chiovetta A, Shinitzky M: Correlation between cortisol level and serotonin uptake in patients with chronic stress and depression. Cogn Affect Behav Neurosci 2001,1(4):388–393. CrossRef
- Williams GP: The role of oestrogen in the pathogenesis of obesity, type 2 diabetes, breast cancer and prostate disease. Eur J Cancer Prev 2010,19(4):256–271. CrossRef
- Schürks M, Glynn RJ, Rist PM, Tzourio C, Kurth T: Effects of vitamin E on stroke subtypes: meta-analysis of randomised controlled trials. BMJ 2010, 341:c5702. CrossRef
- Benkler M, Agmon-Levin N, Shoenfeld Y: Parkinson’s disease, autoimmunity, and olfaction. Int J Neurosci 2009,119(12):2133–2143. CrossRef
- Moscavitch SD, Szyper-Kravitz M, Shoenfeld Y: Autoimmune pathology accounts for common manifestations in a wide range of neuro-psychiatric disorders: the olfactory and immune system interrelationship. Clin Immunol 2009,130(3):235–243. CrossRef
- Faria AM, Weiner HL: Oral tolerance. Immunol Rev 2005, 206:232–259. CrossRef
- Teixeira G, Paschoal PO, de Oliveira VL, Pedruzzi MM, Campos SM, Andrade L, Nobrega A: Diet selection in immunologically manipulated mice. Immunobiology 2008,213(1):1–12. CrossRef
- Schiffman SS, Sattely-Miller EA, Taylor EL, Graham BG, Landerman LR, Zervakis J, Campagna LK, Cohen HJ, Blackwell S, Garst JL: Combination of flavor enhancement and chemosensory education improves nutritional status in older cancer patients. J Nutr Health Aging 2007,11(5):439–454.
- Murphy C, Davidson TM, Jellison W, Austin S, Mathews WC, Ellison DW, Schlotfeldt C: Sinonasal disease and olfactory impairment in HIV disease: endoscopic sinus surgery and outcome measures. Laryngoscope 2000,110(10 Pt 1):1707–1710. CrossRef
- Zucco GM, Ingegneri G: Olfactory deficits in HIV-infected patients with and without AIDS dementia complex. Physiol Behav 2004,80(5):669–674. CrossRef
- Tandeter H, Levy A, Gutman G, Shvartzman P: Subclinical thyroid disease in patients with Parkinson’s disease. Arch Gerontol Geriatr 2001,33(3):295–300. CrossRef
- Chinnakkaruppan A, Das S, Sarkar PK: Age related and hypothyroidism related changes on the stoichiometry of neurofilament subunits in the developing rat brain. Int J Dev Neurosci 2009,27(3):257–261. CrossRef
- García-Moreno JM, Chacón-Peña J: Hypothyroidism and Parkinson’s disease and the issue of diagnostic confusion. Mov Disord 2003,18(9):1058–1059. CrossRef
- Munhoz RP, Teive HA, Troiano AR, Hauck PR, Herdoiza Leiva MH, Graff H, Werneck LC: Parkinson’s disease and thyroid dysfunction. Parkinsonism Relat Disord 2004,10(6):381–383. CrossRef
- Ferreira JJ, Neutel D, Mestre T, Coelho M, Rosa MM, Rascol O, Sampaio C: Skin cancer and Parkinson’s disease. Mov Disord 2010,25(2):139–148. CrossRef
- An automated framework for hypotheses generation using literature
- Open Access
- Available under Open Access This content is freely available online to anyone, anywhere at any time.
- Online Date
- August 2012
- Online ISSN
- BioMed Central
- Additional Links
- Disease network
- Disease model
- Biological literature-mining
- Hypothesis generation
- Knowledge discovery
- MeSH ontology
- Author Affiliations
- 1. Department of Electrical and Computer Engineering, Memphis University, Memphis, TN, 38152, USA
- 2. College of Arts and Sciences, Bioinformatics Program, Memphis University, Memphis, TN, 38152, USA
- 3. Department of Neurology, University of Tennessee Health Science Center, Memphis, TN, 38163, USA