Abstract
Transdisciplinary research is a rapidly expanding part of science and engineering, demanding newmethods for connecting results across fields. In biomedicine for example, modeling complex biological systems requires linking knowledge acrossmulti-level of science, fromgenes to disease. Themove to multilevel research requires new strategies; in this discussion we present path knowledge discovery, a novel methodology for linking published research findings.
The development of path knowledge discovery was motivated by problems in neuropsychiatry, where researchers need to discover interrelationships extending across brain biology that link genotype (such as dopamine gene mutations) to phenotype (observable characteristics of organisms such as cognitive performance measures). To advance an understanding of the complex bases of neuropsychiatric diseases, researchers need to search and discover relations among the many manifestations of these diseases across multiple biological and behavioral levels (i.e., genotypes and phenotypes at levels from molecular expression through complex syndromes). Phenomics – the study of phenotypes on a genome-wide scale – requires close collaboration among specialists in multiple fields. We developed a computer-aided path knowledge discovery methodology to accomplish this goal.
Path knowledge discovery consists of two integral tasks: 1) association path mining among concepts in multipart phenotypes that cross disciplines, and 2) finegranularity knowledge-based content retrieval along the path(s) to permit deeper analysis. Implementing this methodology with our PhenoMining tools has required development of innovative measures of association strength for pairwise associations, as well as the strength for sequences of associations, in addition to powerful lexicon-based association expansion to increase the scope of matching. In our discussions we describe the validation of the methodology using a published heritability study from cognition research, and we obtain comparable results. We show how PhenoMining tools can greatly reduce a domain expert’s time (by several orders of magnitude) when searching and gathering knowledge from the published literature, and can facilitate derivation of interpretable results.
We built these PhenoMining tools on an existing knowledge base (PhenoWiki.org), now called PhenoWiki+, which can greatly speed up the knowledge acquisition process. Further, using the Resource Description Framework (RDF) data model in the PhenoWiki knowledge repository allows us to connect with different knowledge sources to enlarge the knowledge scope. The knowledge base also supports annotation, an important capability for collaborative knowledge discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Phenomining lexicon, http://phenominingbeta.cs.ucla.edu/static/new_lexicon.txt
Pubmed central web site, http://www.ncbi.nlm.nih.gov/pmc/
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Anokhin, A.P., Golosheykin, S., Grant, J.D., Heath, A.C.: Developmental and genetic influences on prefrontal function in adolescents: a longitudinal twin study of wcst performance. Neuroscience Letters 472(2), 119–122 (2010)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)
Baker, N.C., Hemminger, B.M.: Mining connections between chemicals, proteins, and diseases extracted from medline annotations. Journal of Biomedical Informatics 43(4), 510 (2010)
Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Information Processing & Management 43(4), 866–886 (2007)
Bilder, R.M., Sabb, F.W., Cannon, T.D., London, E.D., Jentsch, J.D., Parker, D.S., Poldrack, R.A., Evans, C., Freimer, N.B.: Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience 164(1), 30–42 (2009)
Bilder, R.M., Sabb, F.W., Parker, D.S., Kalar, D., Chu, W.W., Fox, J., Freimer, N.B., Poldrack, R.A.: Cognitive ontologies for neuropsychiatric phenomics research. Cognitive Neuropsychiatry 14(4-5), 419–450 (2009)
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Glausier, J.R., Khan, Z.U., Muly, E.C.: Dopamine D1 and D5 receptors are localized to discrete populations of interneurons in primate prefrontal cortex. Cerebral Cortex 19(8), 1820–1834 (2009)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 349. American Medical Informatics Association (2006)
Karlsgodt, K.H., Kochunov, P., Winkler, A.M., Laird, A.R., Almasy, L., Duggirala, R., Olvera, R.L., Fox, P.T., Blangero, J., Glahn, D.C.: A multimodal assessment of the genetic control over working memory. The Journal of Neuroscience 30(24), 8197–8202 (2010)
Kremen, W.S., Xian, H., Jacobson, K.C., Eaves, L.J., Franz, C.E., Panizzon, M.S., Eisen, S.A., Crider, A., Lyons, M.J.: Storage and executive components of working memory: integrating cognitive psychology and behavior genetics in the study of aging. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 63(2), P84–P91 (2008)
Lesh, T.A., Niendam, T.A., Minzenberg, M.J., Carter, C.S.: Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology 36(1), 316–338 (2010)
Liu, Z., Chu, W.W.: Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Information Retrieval 10(2), 173–202 (2007)
U.S. National Library of Medicine. Fact sheet. medical subject headings, http://www.nlm.nih.gov/pubs/factsheets/mesh.html
Oyama, T., Kitano, K., Satou, K., Ito, T.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5), 705–714 (2002)
Parker, D.S., Chu, W.W., Sabb, F.W., Toga, A.W., Bilder, R.M.: Literature mapping with pubatlas extending pubmed with a blasting interface. Summit on Translational Bioinformatics 2009, 90 (2009)
Poldrack, R.A., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., Parker, D.S., Sabb, F.W., Bilder, R.M.: The cognitive atlas: toward a knowledge foundation for cognitive neuroscience. Frontiers in Neuroinformatics 5 (2011)
Prud’hommeaux, E., Seaborne, A.: Sparql query language for rdf, http://www.w3.org/TR/rdf-sparql-query/
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Runyan, J.D., Moore, A.N., Dash, P.K.: A role for prefrontal calcium-sensitive protein phosphatase and kinase activities in working memory. Learning & Memory 12(2), 103–110 (2005)
Sabb, F.W., Bearden, C.E., Glahn, D.C., Parker, D.S., Freimer, N., Bilder, R.M.: A collaborative knowledge base for cognitive phenomics. Molecular Psychiatry 13(4), 350–360 (2008)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Seamans, J.K., Durstewitz, D., Christie, B.R., Stevens, C.F., Sejnowski, T.J.: Dopamine D1/D5 receptor modulation of excitatory synaptic inputs to layer V prefrontal cortex neurons. Proceedings of the National Academy of Sciences 98(1), 301–306 (2001)
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2(1), 39–68 (1998)
Smalheiser, N.R., Torvik, V.I., Zhou, W.: Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in medline. Computer Methods and Programs in Biomedicine 94(2), 190 (2009)
Stins, J.F., van Baal, G.C.M., Polderman, T.J.C., Verhulst, F.C., Boomsma, D.I.: Heritability of stroop and flanker performance in 12-year old children. BMC Neuroscience 5(1), 49 (2004)
Tan, P.-N., Kumar, V., Srivastava, J.: Indirect association: Mining higher order dependencies in data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 632–637. Springer, Heidelberg (2000)
Vinkhuyzen, A.A.E., Van Der Sluis, S., Boomsma, D.I., de Geus, E.J.C., Posthuma, D.: Individual differences in processing speed and working memory speed as assessed with the sternberg memory scanning task. Behavior Genetics 40(3), 315–326 (2010)
Von Huben, S.N., Davis, S.A., Lay, C.C., Katner, S.N., Crean, R.D., Taffe, M.A.: Differential contributions of dopaminergic D1-and D2-like receptors to cognitive function in rhesus monkeys. Psychopharmacology 188(4), 586–596 (2006)
Voytek, J.B., Voytek, B.: Automated cognome construction and semi-automated hypothesis generation. Journal of Neuroscience Methods (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Liu, C., Chu, W.W., Sabb, F., Parker, D.S., Bilder, R. (2014). Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40837-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40836-6
Online ISBN: 978-3-642-40837-3
eBook Packages: EngineeringEngineering (R0)