Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements.
We present a two-phase hybrid segmentation method that combines a series individual classifiers using different aggregation schemes (set operations and simple majority voting). The approach is tested on a corpus comprised of skeletal phenotype descriptions emerged from the Human Phenotype Ontology. Experimental results show that the best hybrid method achieves an F-Score of 97.05% in the first phase and F-Scores of 97.16% / 94.50% in the second phase.
The performance of the initial segmentation of anatomical entities and qualities (phase I) is not affected by the presence / absence of external resources, such as domain dictionaries. From a generic perspective, hybrid methods may not always improve the segmentation accuracy as they are heavily dependent on the goal and data characteristics.
- Mabee PM, Ashburner M, Cronk Q, Gkoutos GV, Haendel M, Segerdell E, Mungall C, Westerfield M: Phenotype ontologies: the bridge between genomics and evolution. Trends Ecol Evol 2007,22(7):345–350. CrossRef
- Smith C, Goldsmith C, Eppig J: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 2005, 6:R7. CrossRef
- Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S: The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am J Human Genet 2008,83(5):610–615. CrossRef
- Carey JC, Allanson JE, Hennekam RC, Biesecker LG: Standard terminology for phenotypic variations: the elements of morphology project, its current progress, and future directions. Human Mutation 2012, 781–786. [ http://doi:10.1002/humu.22053]
- Dahdul WM, Balhoff JP, Engeman J, Grande T, Hilton EJ, Kothari C, Lapp H, Lundberg JG, Midford PE, Vision TJ, Westerfield M, Mabee PM: Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 2010,5(5):e10708. CrossRef
- Hoehndorf R, Schofield PN, Gkoutos GV: PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 2011,38(18):e119. CrossRef
- Schofield PN, Hoehndorf R, Gkoutos GV: Mouse genetic and phenotypic resources for human genetics. Human Mutation 2012,33(5):826–836. CrossRef
- Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE: Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 2009,7(11):e1000247. [ http://doi:10.1371/journal.pbio.1000247] CrossRef
- Kohler S, Schulz MH, Krawitz P, Bauer S, Dolken S, Ott CE, Mundlos C, Horn D, Mundlos S, Robinson PN: Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Human Genet 2009,85(4):457–464. CrossRef
- Gkoutos GV, Mungall C, Doelken S, Ashburner M, Lewis S, Hancock J, Schofield P, Koehler S, Robinson PN: Entity/quality-based logical definitions for the human skeletal phenome using PATO. In Proceedings of the 31st Annual International Conference of the IEEE EMBS. Minneapolis: IEEE; 2009:7069–7072.
- Rosse C, Mejino JLV: A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inf 2003,36(6):478–500. CrossRef
- Balhoff JP, Dahdul WM, Kothari CR, Lapp H, Lundberg JG, Mabee P, Midford PE, Westerfield M, Vision TJ: Phenex: ontological annotation of phenotypic diversity. PLoS ONE 2010,5(5):e10500. CrossRef
- Groza T, Zankl A, Li YF, Hunter J: Using semantic web technologies to build a community-driven knowledge curation platform for the skeletal dysplasia domain. In Proc. of the 10th International Semantic Web Conference (ISWC 2011). Berlin, Heidelberg: Springer-Verlag; 2011:81–96. CrossRef
- Groza T, Hunter J, Zankl A: The bone dysplasia ontology: integrating genotype and phenotype information in the skeletal dysplasia domain. BMC Bioinformatics 2012.,13(50): [ http://doi:10.1186/1471–2105–13–50] [ ]
- Lafferty JD, McCallum A, Pereira FCN: Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the International Conference on Machine Learning (ICML 2001) 2001, 282–289.
- Sun C, Guan Y, Wang X, Lin L: Rich features based Conditional Random Fields for biological named entities recognition. Comput Biol Med 2007,37(9):1327–1333. CrossRef
- L Li RZ, Huang D: Two-phase biomedical named entity recognition using CRFs. Comput Biol Chem 2009,33(4):334–338. CrossRef
- Yang Z, Lin H, Li Y: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J Biomed Inf 2008,41(4):580–587. CrossRef
- Zhou G, Shen D, Zhang J, Su J, Tan S: Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics 2005, 6:S7. CrossRef
- Torii M, Hu Z, Wu C, Liu H: BioTagger-GM: a gene/protein name recognition system. J Am Med Inf Assoc 2009,16(2):247–255. CrossRef
- Browne AC, McCray AT, Srinivasan S: The SPECIALIST Lexicon. 2000. Tech. rep., Lister Hill National Center for Biomedical Communications, National Library of Medicine Bethesda, Maryland
- McCallum AK: MALLET: a Machine Learning for Language Toolkit. 2002.
- Kudoh T, Matsumoto Y: Use of support Vector learning for chunk identification. In Proceedings of CoNLL and ALL 2000. Stroudsburg: ACM; 2000:142–144.
- Li L, Fan W, Huang D, Dang Y, Sun J: Boosting performance of gene mention tagging system by hybrid methods. J Biomed Inf 2012, 45:156–164. CrossRef
- Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods
- Open Access
- Available under Open Access This content is freely available online to anyone, anywhere at any time.
- Online Date
- October 2012
- Online ISSN
- BioMed Central
- Additional Links
- Industry Sectors
- Author Affiliations
- 1. School of ITEE, The University of Queensland, Brisbane, Australia
- 2. Bone Dysplasia Research Group, UQ Centre for Clinical Research (UQCCR), University of Queensland, Brisbane, Australia
- 3. Genetic Health Queensland, Royal Brisbane and Women’s Hospital, Herston, Brisbane, Australia