Biomedical Informatics

Page, C. David; Natarajan, Sriraam

doi:10.1007/978-0-387-30164-8_81

C. David Page &
Sriraam Natarajan

176 Accesses

Introduction

Recent years have witnessed a tremendous increase in the use of machine learning for biomedical applications. This surge in interest has several causes. One is the successful application of machine learning technologies in other fields such as web search, speech and handwriting recognition, agent design, spatial modeling, etc. Another is the development of technologies that enable the production of large amounts of data in the time it used to take to generate a single data point (run a single experiment). A third most recent development is the advent of Electronic Medical/Health Records (EMRs/EHRs). The drastic increase in the amount of data generated has led the biologists and clinical researchers to adopt algorithms that can construct predictive models from large amounts of data. Naturally, machine learning is emerging as a tool of choice.

In this article, we will present a few data types and tasks involving such large-scale biological data, where machine learning...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Ananiev, G. E., Goldstein, S., Runnheim, R., Forrest, D. K., Zhou, S., Potamousis, K., Churas, C. P., Bergendah, V., Thomson, J. A., & David, C. (2008). Schwartz1. Optical mapping discerns genome wide DNA methylation profiles. BMC Molecular Biology, 9, doi:10.1186/1471-2199-9-68.
Google Scholar
Baggerly, K., Morris, J. S., & Combes, K. R. (2004). Reproducibility of seldi-tof protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20, 777–785.
Google Scholar
Bonneau, R., & Baker, D. (2001). Ab initio protein structure prediction: Progress and prospects. Annual Review of Biophysics and Biomolecular Structure, 30, 173–189.
Google Scholar
Burnside, E. S., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M. J., Geller, B. M., Littenberg, B., Kahn, C. E., Shaffer, K., & Page, D. (2009). Unique features of hla-mediated hiv evolution in a mexican cohort: A comparative study. Radiology, 251, 663–672.
Google Scholar
Carlson, J., Valenzuela-Ponce, H., Blanco-Heredia, J., Garrido-Rodriguez, D., Garcia-Morales, C., Heckerman, D., et al. (2009). Unique features of hla-mediated hiv evolution in a mexican cohort: A comparative study. Retrovirology, 6(72), 39.
Google Scholar
Davis, J., Costa, V. S., Ray, S., & Page, D. (2007a). An integrated approach to feature construction and model building for drug activity prediction. In Proceedings of the 24th international conference on machine learning (ICML).
Google Scholar
Davis, J., Ong, I., Struyf, J., Burnside, E., Page, D., & Costa, V. S. (2007b). Change of representation for statistical relational learning. In Proceedings of the 20th international joint conference on artificial intelligence (IJCAI).
Google Scholar
DiMaio, F., Kondrashov, D., Bitto, E., Soni, A., Bingman, C., Phillips, G., & Shavlik, J. (2007). Creating protein models from electron-density maps using particle-filtering methods. Bioinformatics, 23, 2851–2858.
Google Scholar
Easton, D. F., Pooley, K. A., Dunning, A. M., Pharoah, P. D., et al. (2007). Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 447, 1087–1093.
Google Scholar
Finn, P., Muggleton, S., Page, D., & Srinivasan, A. (1998). Discovery of pharmacophores using the inductive logic programming system progol. Machine Learning, 30(1, 2), 241–270.
Google Scholar
Friedman, N. (2000). Being Bayesian about network structure. In Machine Learning, 50, 95–125.
Google Scholar
Friedman, N., & Halpern, J. (1999). Modeling beliefs in dynamic systems. part ii: Revision and update. Journal of AI Research, 10, 117–167.
MATH MathSciNet Google Scholar
Furey, T. S., Cristianini, N., Duffy, N., Bednarski, B. W., Schummer, M., & Haussler, D. (2000). Support vector classification and validation of cancer tissue samples using microarray expression. Bioinformatics, 16(10), 906–914.
Google Scholar
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge, MA: MIT Press.
MATH Google Scholar
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Google Scholar
Hardin, J., Waddell, M., Page, C. D., Zhan, F., Barlogie, B., Shaughnessy, J., et al. (2004). Evaluation of multiple models to distinguish closely related forms of disease using DNA microarray data: An application to multiple myeloma. Statistical Applications in Genetics and Molecular Biology, 3(1).
Google Scholar
Jain, A. N., Dietterich, T. G., Lathrop, R. H., Chapman, D., Critchlow, R. E., Bauer, B. E., et al. (1994). Compass: A shape-based machine learning tool for drug design. Aided Molecular Design, 8(6), 635–652.
Google Scholar
Jones, K. E., Reiser, F. M., Bryant, P. G. K., Muggleton, C. H., Kell, S., King, D. B., et al. (2004). Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427, 247–252.
Google Scholar
KDD cup (2001). http://pages.cs.wisc.edu/~dpage/kddcup2001/.
Klösgen, W. (2002). Handbook of data mining and knowledge discovery, chapter 16.3: Subgroup discovery. New York: Oxford University Press.
Google Scholar
Listgarten, J., Damaraju, S., Poulin, B., Cook, L., Dufour, J., Driga, A., et al. (2004). Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms. Clinical Cancer Research, 10, 2725–2737.
Google Scholar
Mardis, E. R. (2006). Anticipating the 1,000 dollar genome. Genome Biology, 7(7), 112.
Google Scholar
Martin, Y. C., Bures, M. G., Danaher, E. A., DeLazzer, J., Lico, I. I., & Pavlik, P. A. (1993). A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists. Journal of Computer Aided Molecular Design, 8, 751–758.
Google Scholar
McCarty, C., Wilke, R. A., Giampietro, P. F, Wesbrook, S. D., & Caldwell, M. D. (2005). Personalized Medicine Research Project (PMRP): Design, methods and recruitment for a large population-based biobank. Personalized Medicine, 2, 49–79.
Google Scholar
Molla, M., Waddell, M., Page, D., & Shavlik, J. (2004). Using machine learning to design and interpret gene expression microarrays. AI Magazine, 25(1), 23–44.
Google Scholar
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19(20), 629–679.
MathSciNet Google Scholar
Noto, K., & Craven, M. (2006). A specialized learner for inferring structured cis-regulatory modules. BMC Bioinformatics, 7(528), doi:10.1186/1471-2105-7-528.
Google Scholar
Oliver, S. G., Young, M., Aubrey, W., Byrne, E., Liakata, M., Markham, M., et al. (2009). The automation of science. Science, 324, 85–89.
Google Scholar
Ong, I., Glassner, J., & Page, D. (2002). Modelling regulatory pathways in e.coli from time series expression profiles. Bioinformatics, 18, 241S–248S.
Google Scholar
Pe’er, D., Regev, A., Elidan, G., & Friedman, N. (2001). Inferring subnetworks from perturbed expression profiles. Bioinformatics, 17, 215–224.
Google Scholar
Perou, C., Jeffrey, S., Van De Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., et al. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proccedings of National Academy of Science, 96, 9212–9217.
Google Scholar
Petricoin, E. F., III, Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577.
Google Scholar
Rost, B., & Sander, C. (1993). Prediction of protein secondary structure at better than 70 accuracy. Journal of Molecular Biology, 232, 584–599.
Google Scholar
Segal, E., Pe’er, D., Regev, A., Koller, D., & Friedman, N. (April 2005). Learning module networks. Journal of Machine Learning Research, 6, 557–588.
MathSciNet Google Scholar
Spatola, A., Page, D., Vogel, D., Blondell, S., & Crozet, Y. (1999). Can machine learning and combinatorial chemistry co-exist? In Proceedings of the American Peptide Symposium. Kluwer Academic Publishers.
Google Scholar
Srinivasan, A. (2001). The aleph manual. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100, 9440–9445.
MATH MathSciNet Google Scholar
The International Warfarin Pharmacogenetics Consortium (IWPC) (2009). Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data. The New England Journal of Medicine, 360:753–764.
Google Scholar
Tucker, A., Vinciotti, V., Hoen, P. A. C., Liu, X., & Famili, A. F. (2005). Bayesian network classifiers for time-series microarray data. Advances in Intelligent Data Analysis VI, 3646, 475–485.
Google Scholar
Van’t Veer, L. L., Dai, H., van de Vijver, M. M., He, Y., Hart, A., Mao, M., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.
Google Scholar
Waddell, M., Page, D., & Shaughnessy, J., Jr. (2005). Predicting cancer susceptibility from single-nucleotide polymorphism data: A case study in multiple myeloma. BIOKDD’05: Proceedings of the fifth international workshop on bioinformatics, Chicago, IL.
Google Scholar
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In European symposium on principles of kdd (pp. 78–87). Lecture notes in computer science, Springer, Norway.
Google Scholar
Zhang, X., Mesirov, J. P., & Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. Journal of Molecular Biology, 225, 81–92.
Google Scholar
Zou, M., & Conzen, S. D. (2005). A new dynamic Bayesian network approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 21, 71–79.
Google Scholar

Download references

Author information

Authors and Affiliations

Authors

C. David Page
View author publications
You can also search for this author in PubMed Google Scholar
Sriraam Natarajan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Page, C.D., Natarajan, S. (2011). Biomedical Informatics. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_81

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_81
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics