Skip to main content

Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics

  • Chapter
Data Mining and Knowledge Discovery for Big Data

Part of the book series: Studies in Big Data ((SBD,volume 1))

Abstract

Transdisciplinary research is a rapidly expanding part of science and engineering, demanding newmethods for connecting results across fields. In biomedicine for example, modeling complex biological systems requires linking knowledge acrossmulti-level of science, fromgenes to disease. Themove to multilevel research requires new strategies; in this discussion we present path knowledge discovery, a novel methodology for linking published research findings.

The development of path knowledge discovery was motivated by problems in neuropsychiatry, where researchers need to discover interrelationships extending across brain biology that link genotype (such as dopamine gene mutations) to phenotype (observable characteristics of organisms such as cognitive performance measures). To advance an understanding of the complex bases of neuropsychiatric diseases, researchers need to search and discover relations among the many manifestations of these diseases across multiple biological and behavioral levels (i.e., genotypes and phenotypes at levels from molecular expression through complex syndromes). Phenomics – the study of phenotypes on a genome-wide scale – requires close collaboration among specialists in multiple fields. We developed a computer-aided path knowledge discovery methodology to accomplish this goal.

Path knowledge discovery consists of two integral tasks: 1) association path mining among concepts in multipart phenotypes that cross disciplines, and 2) finegranularity knowledge-based content retrieval along the path(s) to permit deeper analysis. Implementing this methodology with our PhenoMining tools has required development of innovative measures of association strength for pairwise associations, as well as the strength for sequences of associations, in addition to powerful lexicon-based association expansion to increase the scope of matching. In our discussions we describe the validation of the methodology using a published heritability study from cognition research, and we obtain comparable results. We show how PhenoMining tools can greatly reduce a domain expert’s time (by several orders of magnitude) when searching and gathering knowledge from the published literature, and can facilitate derivation of interpretable results.

We built these PhenoMining tools on an existing knowledge base (PhenoWiki.org), now called PhenoWiki+, which can greatly speed up the knowledge acquisition process. Further, using the Resource Description Framework (RDF) data model in the PhenoWiki knowledge repository allows us to connect with different knowledge sources to enlarge the knowledge scope. The knowledge base also supports annotation, an important capability for collaborative knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Phenomining lexicon, http://phenominingbeta.cs.ucla.edu/static/new_lexicon.txt

  2. Pubmed central web site, http://www.ncbi.nlm.nih.gov/pmc/

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  4. Anokhin, A.P., Golosheykin, S., Grant, J.D., Heath, A.C.: Developmental and genetic influences on prefrontal function in adolescents: a longitudinal twin study of wcst performance. Neuroscience Letters 472(2), 119–122 (2010)

    Article  Google Scholar 

  5. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)

    Article  Google Scholar 

  6. Baker, N.C., Hemminger, B.M.: Mining connections between chemicals, proteins, and diseases extracted from medline annotations. Journal of Biomedical Informatics 43(4), 510 (2010)

    Article  Google Scholar 

  7. Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Information Processing & Management 43(4), 866–886 (2007)

    Article  Google Scholar 

  8. Bilder, R.M., Sabb, F.W., Cannon, T.D., London, E.D., Jentsch, J.D., Parker, D.S., Poldrack, R.A., Evans, C., Freimer, N.B.: Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience 164(1), 30–42 (2009)

    Article  Google Scholar 

  9. Bilder, R.M., Sabb, F.W., Parker, D.S., Kalar, D., Chu, W.W., Fox, J., Freimer, N.B., Poldrack, R.A.: Cognitive ontologies for neuropsychiatric phenomics research. Cognitive Neuropsychiatry 14(4-5), 419–450 (2009)

    Article  Google Scholar 

  10. Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)

    Article  Google Scholar 

  11. Glausier, J.R., Khan, Z.U., Muly, E.C.: Dopamine D1 and D5 receptors are localized to discrete populations of interneurons in primate prefrontal cortex. Cerebral Cortex 19(8), 1820–1834 (2009)

    Article  Google Scholar 

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)

    Chapter  Google Scholar 

  13. Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 349. American Medical Informatics Association (2006)

    Google Scholar 

  14. Karlsgodt, K.H., Kochunov, P., Winkler, A.M., Laird, A.R., Almasy, L., Duggirala, R., Olvera, R.L., Fox, P.T., Blangero, J., Glahn, D.C.: A multimodal assessment of the genetic control over working memory. The Journal of Neuroscience 30(24), 8197–8202 (2010)

    Article  Google Scholar 

  15. Kremen, W.S., Xian, H., Jacobson, K.C., Eaves, L.J., Franz, C.E., Panizzon, M.S., Eisen, S.A., Crider, A., Lyons, M.J.: Storage and executive components of working memory: integrating cognitive psychology and behavior genetics in the study of aging. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 63(2), P84–P91 (2008)

    Google Scholar 

  16. Lesh, T.A., Niendam, T.A., Minzenberg, M.J., Carter, C.S.: Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology 36(1), 316–338 (2010)

    Article  Google Scholar 

  17. Liu, Z., Chu, W.W.: Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Information Retrieval 10(2), 173–202 (2007)

    Article  MATH  Google Scholar 

  18. U.S. National Library of Medicine. Fact sheet. medical subject headings, http://www.nlm.nih.gov/pubs/factsheets/mesh.html

  19. Oyama, T., Kitano, K., Satou, K., Ito, T.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18(5), 705–714 (2002)

    Article  Google Scholar 

  20. Parker, D.S., Chu, W.W., Sabb, F.W., Toga, A.W., Bilder, R.M.: Literature mapping with pubatlas extending pubmed with a blasting interface. Summit on Translational Bioinformatics 2009, 90 (2009)

    Google Scholar 

  21. Poldrack, R.A., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., Parker, D.S., Sabb, F.W., Bilder, R.M.: The cognitive atlas: toward a knowledge foundation for cognitive neuroscience. Frontiers in Neuroinformatics 5 (2011)

    Google Scholar 

  22. Prud’hommeaux, E., Seaborne, A.: Sparql query language for rdf, http://www.w3.org/TR/rdf-sparql-query/

  23. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)

    Article  Google Scholar 

  24. Runyan, J.D., Moore, A.N., Dash, P.K.: A role for prefrontal calcium-sensitive protein phosphatase and kinase activities in working memory. Learning & Memory 12(2), 103–110 (2005)

    Article  Google Scholar 

  25. Sabb, F.W., Bearden, C.E., Glahn, D.C., Parker, D.S., Freimer, N., Bilder, R.M.: A collaborative knowledge base for cognitive phenomics. Molecular Psychiatry 13(4), 350–360 (2008)

    Article  Google Scholar 

  26. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  27. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  28. Seamans, J.K., Durstewitz, D., Christie, B.R., Stevens, C.F., Sejnowski, T.J.: Dopamine D1/D5 receptor modulation of excitatory synaptic inputs to layer V prefrontal cortex neurons. Proceedings of the National Academy of Sciences 98(1), 301–306 (2001)

    Article  Google Scholar 

  29. Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2(1), 39–68 (1998)

    Article  Google Scholar 

  30. Smalheiser, N.R., Torvik, V.I., Zhou, W.: Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in medline. Computer Methods and Programs in Biomedicine 94(2), 190 (2009)

    Article  Google Scholar 

  31. Stins, J.F., van Baal, G.C.M., Polderman, T.J.C., Verhulst, F.C., Boomsma, D.I.: Heritability of stroop and flanker performance in 12-year old children. BMC Neuroscience 5(1), 49 (2004)

    Article  Google Scholar 

  32. Tan, P.-N., Kumar, V., Srivastava, J.: Indirect association: Mining higher order dependencies in data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 632–637. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  33. Vinkhuyzen, A.A.E., Van Der Sluis, S., Boomsma, D.I., de Geus, E.J.C., Posthuma, D.: Individual differences in processing speed and working memory speed as assessed with the sternberg memory scanning task. Behavior Genetics 40(3), 315–326 (2010)

    Article  Google Scholar 

  34. Von Huben, S.N., Davis, S.A., Lay, C.C., Katner, S.N., Crean, R.D., Taffe, M.A.: Differential contributions of dopaminergic D1-and D2-like receptors to cognitive function in rhesus monkeys. Psychopharmacology 188(4), 586–596 (2006)

    Article  Google Scholar 

  35. Voytek, J.B., Voytek, B.: Automated cognome construction and semi-automated hypothesis generation. Journal of Neuroscience Methods (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Liu, C., Chu, W.W., Sabb, F., Parker, D.S., Bilder, R. (2014). Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40837-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40836-6

  • Online ISBN: 978-3-642-40837-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics