Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements.
We present a two-phase hybrid segmentation method that combines a series individual classifiers using different aggregation schemes (set operations and simple majority voting). The approach is tested on a corpus comprised of skeletal phenotype descriptions emerged from the Human Phenotype Ontology. Experimental results show that the best hybrid method achieves an F-Score of 97.05% in the first phase and F-Scores of 97.16% / 94.50% in the second phase.
The performance of the initial segmentation of anatomical entities and qualities (phase I) is not affected by the presence / absence of external resources, such as domain dictionaries. From a generic perspective, hybrid methods may not always improve the segmentation accuracy as they are heavily dependent on the goal and data characteristics.
- Mabee, PM, Ashburner, M, Cronk, Q, Gkoutos, GV, Haendel, M, Segerdell, E, Mungall, C, Westerfield, M (2007) Phenotype ontologies: the bridge between genomics and evolution. Trends Ecol Evol 22: pp. 345-350 CrossRef
- Smith, C, Goldsmith, C, Eppig, J (2005) The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6: pp. R7 CrossRef
- Robinson, PN, Kohler, S, Bauer, S, Seelow, D, Horn, D, Mundlos, S (2008) The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am J Human Genet 83: pp. 610-615 CrossRef
- Carey, JC, Allanson, JE, Hennekam, RC, Biesecker, LG (2012) Standard terminology for phenotypic variations: the elements of morphology project, its current progress, and future directions. Human Mutation. pp. 781-786
- Dahdul, WM, Balhoff, JP, Engeman, J, Grande, T, Hilton, EJ, Kothari, C, Lapp, H, Lundberg, JG, Midford, PE, Vision, TJ, Westerfield, M, Mabee, PM (2010) Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5: pp. e10708 CrossRef
- Hoehndorf, R, Schofield, PN, Gkoutos, GV (2011) PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 38: pp. e119 CrossRef
- Schofield, PN, Hoehndorf, R, Gkoutos, GV (2012) Mouse genetic and phenotypic resources for human genetics. Human Mutation 33: pp. 826-836 CrossRef
- Washington, NL, Haendel, MA, Mungall, CJ, Ashburner, M, Westerfield, M, Lewis, SE (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7: pp. e1000247 CrossRef
- Kohler, S, Schulz, MH, Krawitz, P, Bauer, S, Dolken, S, Ott, CE, Mundlos, C, Horn, D, Mundlos, S, Robinson, PN (2009) Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Human Genet 85: pp. 457-464 CrossRef
- Gkoutos, GV, Mungall, C, Doelken, S, Ashburner, M, Lewis, S, Hancock, J, Schofield, P, Koehler, S, Robinson, PN (2009) Entity/quality-based logical definitions for the human skeletal phenome using PATO. Proceedings of the 31st Annual International Conference of the IEEE EMBS. IEEE, Minneapolis, pp. 7069-7072
- Rosse, C, Mejino, JLV (2003) A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inf 36: pp. 478-500 CrossRef
- Balhoff, JP, Dahdul, WM, Kothari, CR, Lapp, H, Lundberg, JG, Mabee, P, Midford, PE, Westerfield, M, Vision, TJ (2010) Phenex: ontological annotation of phenotypic diversity. PLoS ONE 5: pp. e10500 CrossRef
- Groza, T, Zankl, A, Li, YF, Hunter, J (2011) Using semantic web technologies to build a community-driven knowledge curation platform for the skeletal dysplasia domain. Proc. of the 10th International Semantic Web Conference (ISWC 2011). Springer-Verlag, Berlin, Heidelberg, pp. 81-96 CrossRef
- Groza T, Hunter J, Zankl A: The bone dysplasia ontology: integrating genotype and phenotype information in the skeletal dysplasia domain. BMC Bioinformatics 2012.,13(50): [ http://doi:10.1186/1471–2105–13–50] [ ]
- Lafferty, JD, McCallum, A, Pereira, FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the International Conference on Machine Learning (ICML 2001). pp. 282-289
- Sun, C, Guan, Y, Wang, X, Lin, L (2007) Rich features based Conditional Random Fields for biological named entities recognition. Comput Biol Med 37: pp. 1327-1333 CrossRef
- L Li, RZ, Huang, D (2009) Two-phase biomedical named entity recognition using CRFs. Comput Biol Chem 33: pp. 334-338 CrossRef
- Yang, Z, Lin, H, Li, Y (2008) Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J Biomed Inf 41: pp. 580-587 CrossRef
- Zhou, G, Shen, D, Zhang, J, Su, J, Tan, S (2005) Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics 6: pp. S7 CrossRef
- Torii, M, Hu, Z, Wu, C, Liu, H (2009) BioTagger-GM: a gene/protein name recognition system. J Am Med Inf Assoc 16: pp. 247-255 CrossRef
- Browne, AC, McCray, AT, Srinivasan, S (2000) The SPECIALIST Lexicon.
- McCallum, AK (2002) MALLET: a Machine Learning for Language Toolkit.
- Kudoh, T, Matsumoto, Y (2000) Use of support Vector learning for chunk identification. Proceedings of CoNLL and ALL 2000. ACM, Stroudsburg, pp. 142-144
- Li, L, Fan, W, Huang, D, Dang, Y, Sun, J (2012) Boosting performance of gene mention tagging system by hybrid methods. J Biomed Inf 45: pp. 156-164 CrossRef
- Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods
- Open Access
- Available under Open Access This content is freely available online to anyone, anywhere at any time.
- Online Date
- October 2012
- Online ISSN
- BioMed Central
- Additional Links
- Industry Sectors
- Author Affiliations
- 1. School of ITEE, The University of Queensland, Brisbane, Australia
- 2. Bone Dysplasia Research Group, UQ Centre for Clinical Research (UQCCR), University of Queensland, Brisbane, Australia
- 3. Genetic Health Queensland, Royal Brisbane and Women’s Hospital, Herston, Brisbane, Australia