Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics

Liu, Chen; Chu, Wesley W.; Sabb, Fred; Parker, D. Stott; Bilder, Robert

doi:10.1007/978-3-642-40837-3_5

Chen Liu³,
Wesley W. Chu³,
Fred Sabb⁴,
D. Stott Parker³ &
…
Robert Bilder⁴

Part of the book series: Studies in Big Data ((SBD,volume 1))

7274 Accesses
2 Citations

Abstract

Transdisciplinary research is a rapidly expanding part of science and engineering, demanding newmethods for connecting results across fields. In biomedicine for example, modeling complex biological systems requires linking knowledge acrossmulti-level of science, fromgenes to disease. Themove to multilevel research requires new strategies; in this discussion we present path knowledge discovery, a novel methodology for linking published research findings.

The development of path knowledge discovery was motivated by problems in neuropsychiatry, where researchers need to discover interrelationships extending across brain biology that link genotype (such as dopamine gene mutations) to phenotype (observable characteristics of organisms such as cognitive performance measures). To advance an understanding of the complex bases of neuropsychiatric diseases, researchers need to search and discover relations among the many manifestations of these diseases across multiple biological and behavioral levels (i.e., genotypes and phenotypes at levels from molecular expression through complex syndromes). Phenomics – the study of phenotypes on a genome-wide scale – requires close collaboration among specialists in multiple fields. We developed a computer-aided path knowledge discovery methodology to accomplish this goal.

Path knowledge discovery consists of two integral tasks: 1) association path mining among concepts in multipart phenotypes that cross disciplines, and 2) finegranularity knowledge-based content retrieval along the path(s) to permit deeper analysis. Implementing this methodology with our PhenoMining tools has required development of innovative measures of association strength for pairwise associations, as well as the strength for sequences of associations, in addition to powerful lexicon-based association expansion to increase the scope of matching. In our discussions we describe the validation of the methodology using a published heritability study from cognition research, and we obtain comparable results. We show how PhenoMining tools can greatly reduce a domain expert’s time (by several orders of magnitude) when searching and gathering knowledge from the published literature, and can facilitate derivation of interpretable results.

We built these PhenoMining tools on an existing knowledge base (PhenoWiki.org), now called PhenoWiki+, which can greatly speed up the knowledge acquisition process. Further, using the Resource Description Framework (RDF) data model in the PhenoWiki knowledge repository allows us to connect with different knowledge sources to enlarge the knowledge scope. The knowledge base also supports annotation, an important capability for collaborative knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Phenomining lexicon, http://phenominingbeta.cs.ucla.edu/static/new_lexicon.txt
Pubmed central web site, http://www.ncbi.nlm.nih.gov/pmc/
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Anokhin, A.P., Golosheykin, S., Grant, J.D., Heath, A.C.: Developmental and genetic influences on prefrontal function in adolescents: a longitudinal twin study of wcst performance. Neuroscience Letters 472(2), 119–122 (2010)
Article Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)
Article Google Scholar
Baker, N.C., Hemminger, B.M.: Mining connections between chemicals, proteins, and diseases extracted from medline annotations. Journal of Biomedical Informatics 43(4), 510 (2010)
Article Google Scholar
Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Information Processing & Management 43(4), 866–886 (2007)
Article Google Scholar
Bilder, R.M., Sabb, F.W., Cannon, T.D., London, E.D., Jentsch, J.D., Parker, D.S., Poldrack, R.A., Evans, C., Freimer, N.B.: Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience 164(1), 30–42 (2009)
Article Google Scholar
Bilder, R.M., Sabb, F.W., Parker, D.S., Kalar, D., Chu, W.W., Fox, J., Freimer, N.B., Poldrack, R.A.: Cognitive ontologies for neuropsychiatric phenomics research. Cognitive Neuropsychiatry 14(4-5), 419–450 (2009)
Article Google Scholar
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Article Google Scholar
Glausier, J.R., Khan, Z.U., Muly, E.C.: Dopamine D1 and D5 receptors are localized to discrete populations of interneurons in primate prefrontal cortex. Cerebral Cortex 19(8), 1820–1834 (2009)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)
Chapter Google Scholar
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 349. American Medical Informatics Association (2006)
Google Scholar
Karlsgodt, K.H., Kochunov, P., Winkler, A.M., Laird, A.R., Almasy, L., Duggirala, R., Olvera, R.L., Fox, P.T., Blangero, J., Glahn, D.C.: A multimodal assessment of the genetic control over working memory. The Journal of Neuroscience 30(24), 8197–8202 (2010)
Article Google Scholar
Kremen, W.S., Xian, H., Jacobson, K.C., Eaves, L.J., Franz, C.E., Panizzon, M.S., Eisen, S.A., Crider, A., Lyons, M.J.: Storage and executive components of working memory: integrating cognitive psychology and behavior genetics in the study of aging. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 63(2), P84–P91 (2008)
Google Scholar
Lesh, T.A., Niendam, T.A., Minzenberg, M.J., Carter, C.S.: Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology 36(1), 316–338 (2010)
Article Google Scholar
Liu, Z., Chu, W.W.: Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Information Retrieval 10(2), 173–202 (2007)
Article MATH Google Scholar
U.S. National Library of Medicine. Fact sheet. medical subject headings, http://www.nlm.nih.gov/pubs/factsheets/mesh.html
Oyama, T., Kitano, K., Satou, K., Ito, T.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5), 705–714 (2002)
Article Google Scholar
Parker, D.S., Chu, W.W., Sabb, F.W., Toga, A.W., Bilder, R.M.: Literature mapping with pubatlas extending pubmed with a blasting interface. Summit on Translational Bioinformatics 2009, 90 (2009)
Google Scholar
Poldrack, R.A., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., Parker, D.S., Sabb, F.W., Bilder, R.M.: The cognitive atlas: toward a knowledge foundation for cognitive neuroscience. Frontiers in Neuroinformatics 5 (2011)
Google Scholar
Prud’hommeaux, E., Seaborne, A.: Sparql query language for rdf, http://www.w3.org/TR/rdf-sparql-query/
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Article Google Scholar
Runyan, J.D., Moore, A.N., Dash, P.K.: A role for prefrontal calcium-sensitive protein phosphatase and kinase activities in working memory. Learning & Memory 12(2), 103–110 (2005)
Article Google Scholar
Sabb, F.W., Bearden, C.E., Glahn, D.C., Parker, D.S., Freimer, N., Bilder, R.M.: A collaborative knowledge base for cognitive phenomics. Molecular Psychiatry 13(4), 350–360 (2008)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Seamans, J.K., Durstewitz, D., Christie, B.R., Stevens, C.F., Sejnowski, T.J.: Dopamine D1/D5 receptor modulation of excitatory synaptic inputs to layer V prefrontal cortex neurons. Proceedings of the National Academy of Sciences 98(1), 301–306 (2001)
Article Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2(1), 39–68 (1998)
Article Google Scholar
Smalheiser, N.R., Torvik, V.I., Zhou, W.: Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in medline. Computer Methods and Programs in Biomedicine 94(2), 190 (2009)
Article Google Scholar
Stins, J.F., van Baal, G.C.M., Polderman, T.J.C., Verhulst, F.C., Boomsma, D.I.: Heritability of stroop and flanker performance in 12-year old children. BMC Neuroscience 5(1), 49 (2004)
Article Google Scholar
Tan, P.-N., Kumar, V., Srivastava, J.: Indirect association: Mining higher order dependencies in data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 632–637. Springer, Heidelberg (2000)
Chapter Google Scholar
Vinkhuyzen, A.A.E., Van Der Sluis, S., Boomsma, D.I., de Geus, E.J.C., Posthuma, D.: Individual differences in processing speed and working memory speed as assessed with the sternberg memory scanning task. Behavior Genetics 40(3), 315–326 (2010)
Article Google Scholar
Von Huben, S.N., Davis, S.A., Lay, C.C., Katner, S.N., Crean, R.D., Taffe, M.A.: Differential contributions of dopaminergic D1-and D2-like receptors to cognitive function in rhesus monkeys. Psychopharmacology 188(4), 586–596 (2006)
Article Google Scholar
Voytek, J.B., Voytek, B.: Automated cognome construction and semi-automated hypothesis generation. Journal of Neuroscience Methods (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of California, Los Angeles, USA
Chen Liu, Wesley W. Chu & D. Stott Parker
Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, USA
Fred Sabb & Robert Bilder

Authors

Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wesley W. Chu
View author publications
You can also search for this author in PubMed Google Scholar
Fred Sabb
View author publications
You can also search for this author in PubMed Google Scholar
D. Stott Parker
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bilder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Liu .

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Los Angeles, USA
Wesley W. Chu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, C., Chu, W.W., Sabb, F., Parker, D.S., Bilder, R. (2014). Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-40837-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40836-6
Online ISBN: 978-3-642-40837-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics