Skip to main content

AncesTrees: ancestry estimation with randomized decision trees

Abstract

In forensic anthropology, ancestry estimation is essential in establishing the individual biological profile. The aim of this study is to present a new program—AncesTrees—developed for assessing ancestry based on metric analysis. AncesTrees relies on a machine learning ensemble algorithm, random forest, to classify the human skull. In the ensemble learning paradigm, several models are generated and co-jointly used to arrive at the final decision. The random forest algorithm creates ensembles of decision trees classifiers, a non-linear and non-parametric classification technique. The database used in AncesTrees is composed by 23 craniometric variables from 1,734 individuals, representative of six major ancestral groups and selected from the Howells’ craniometric series. The program was tested in 128 adult crania from the following collections: the African slaves’ skeletal collection of Valle da Gafaria; the Medical School Skull Collection and the Identified Skeletal Collection of 21st Century, both curated at the University of Coimbra. The first step of the test analysis was to perform ancestry estimation including all the ancestral groups of the database. The second stage of our test analysis was to conduct ancestry estimation including only the European and the African ancestral groups. In the first test analysis, 75 % of the individuals of African ancestry and 79.2 % of the individuals of European ancestry were correctly identified. The model involving only African and European ancestral groups had a better performance: 93.8 % of all individuals were correctly classified. The obtained results show that AncesTrees can be a valuable tool in forensic anthropology.

This is a preview of subscription content, access via your institution.

References

  1. Ousley S, Jantz R, Freid D (2009) Understanding race and human variation: why forensic anthropologists are good at identifying race. Am J Phys Anthropol 139:68–76

    Article  PubMed  Google Scholar 

  2. Slice D, Ross AH (2009) 3D-ID: geometric morphometric classification of crania for forensic scientists. http://www.3d-id.org. Acessed 24 March 2014

  3. Hefner JT (2009) Cranial nonmetric variation and estimating ancestry. J Forensic Sci 54:985–995

    Article  PubMed  Google Scholar 

  4. Hefner JT, Spradley K, Anderson BE (2011) Ancestry estimation using random forest modelling. Proc. Am. Acad. Forensic Sci. Chicago, IL, pp 352–353

  5. Hefner JT, Ousley SD, Dirkmaat DC (2012) Morphoscopic traits and the assessment of ancestry. In: Dirkmaat DC (ed) Companion forensic anthropol, 1st edn. Wiley-Blackwell, West Sussex, pp 287–310

    Chapter  Google Scholar 

  6. Edgar HJH (2005) Prediction of race using characteristics of dental morphology. J Forensic Sci 50:269–273

    Article  PubMed  Google Scholar 

  7. Edgar HJH (2009) Testing the utility of dental morphological traits commonly used in the forensic identification of ancestry. Front Oral Biol 13:49–54

    Article  PubMed  Google Scholar 

  8. Edgar HJH (2013) Estimation of ancestry using dental morphological characteristics. J Forensic Sci 58(Suppl 1):S3–S8

    Article  PubMed  PubMed Central  Google Scholar 

  9. Hefner JT, Spradley MK, Anderson B (2014) Ancestry assessment using random forest modeling. J Forensic Sci 59:583–589

    Article  PubMed  Google Scholar 

  10. Hefner JT, Ousley SD (2014) Statistical classification methods for estimating ancestry using morphoscopic traits. J Forensic Sci n/a–n/a

  11. Giles E, Elliot O (1962) Race identification from cranial measurements. J Forensic Sci 7:147–157

    Google Scholar 

  12. Ousley SD, Jantz RL (2005) FORDISC 3.0: Personal computer forensic discriminant functions. Universty of Tennesse

  13. Ousley SD, Jantz RL (2012) ForDisc 3 and statistical methods for sex and ancestry estimation. In: Dirkmaat DC (ed) A Companion to Forensic Anthropology, 1st edn. Wiley-Blackwell, West Sussex, UK, pp 311–329

    Chapter  Google Scholar 

  14. Wright R (1992) Correlation between cranial form and geography in homo sapiens: CRANID—a computer program for forensic and other applications. Archaeol Ocean 27:128–134

    Article  Google Scholar 

  15. Wright R (2008) Detection of likely ancestry using CRANID. In: Oxenham M (ed) Forensic approaches death, disaster and abuse. Australian Academic Press, Sydney, pp 111–122

    Google Scholar 

  16. Du Jardin P, Ponsaillé J, Alunni-Perret V, Quatrehomme G (2009) A comparison between neural network and other metric methods to determine sex from the upper femur in a modern French population. Forensic Sci Int 192:127, e1–6

    Article  PubMed  Google Scholar 

  17. Mahfouz M, Badawi A, Merkl B, Fatah EEA, Pritchard E, Kesler K, Moore M, Jantz R, Jantz L (2007) Patella sex determination by 3D statistical shape models and nonlinear classifiers. Forensic Sci Int 173:161–170

    Article  PubMed  Google Scholar 

  18. Moss GP, Shah AJ, Adams RG, Davey N, Wilkinson SC, Pugh WJ, Sun Y (2012) The application of discriminant analysis and machine learning methods as tools to identify and classify compounds with potential as transdermal enhancers. Eur J Pharm Sci Off J Eur Fed Pharm Sci 45:116–127

    CAS  Google Scholar 

  19. Howells WW (1973) Cranial variation in man: a study by multivariate analysis of patterns of difference among recent human populations. Harvard University Press, Cambridge

    Google Scholar 

  20. Howells WW (1989) Skull shapes and the map: craniometric analyses in the dispersion of modern homo. Peabody Museum of Archaeology and Ethnology, Harvard University

  21. Howells WW (1995) Who’s who in skulls: ethnic identification of crania from measurements. Peabody Museum of Archaeology and Ethnology, Harvard University

  22. Howells WW (1996) Howells’ craniometric data on the Internet. Am J Phys Anthropol 101:441–442

    CAS  Article  PubMed  Google Scholar 

  23. Neves MJ, Almeida M, Ferreira MT (2011) História de um arrabalde durante os séculos XV e XVI: O “poço dos negros” em Lagos (Algarve, Portugal) e o seu contributo para o estudo dos escravos africanos em Portugal. In: Matos AT, Costa JPO (eds) Herança do Infante: História, Arqueologia e Museologia em Lagos. Câmara Municipal de Lagos, Lagos, Portugal, pp 29–46

  24. Coelho C (2012) Uma Identidade perdida no mar e reencontrada nos ossos: avaliação das afinidades populacionais de uma amostra de escravos dos séculos XV–XVI. Dissertation, University of Coimbra

  25. Cunha E, Wasterlain S (2007) The Coimbra identified osteological collections. In: Grupe G, Peters J (eds) Skeletal series and their socio-economic context. Verlag Marie Leidorf, GmbH, Rahden/Westf, Germany, pp 23–33

  26. Cunha E (1989) Cálculo de Funções Discriminantes para a Diagnose Sexual do Crânio. Dissertation, University of Coimbra

  27. Ferreira MT, Navega D, Vicente R, Cunha E (2013) A Colecção de Esqueletos Identificados Século XXI. 12° Congr. Nac. Med. Leg. E Ciênc. Forenses

  28. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  29. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157

    Article  Google Scholar 

  30. Dietterich TG (2000) Ensemble methods in machine learning. Mult. Classif. Syst. Springer Berlin Heidelberg, pp 1–15

  31. Mitchell TM (1997) Machine learning. McGraw Hill, Burr Ridge

    Google Scholar 

  32. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  33. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Google Scholar 

  34. Ho TK (1995) Random decision forests. Proc Third Int Conf Doc Anal Recognit 1:278–282

    Article  Google Scholar 

  35. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844

    Article  Google Scholar 

  36. Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588

    Article  Google Scholar 

  37. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24:2319–2349

    Article  Google Scholar 

  38. Darroch JN, Mosimann JE (1985) Canonical and principal components of shape. Biometrika 72:241–252

    Article  Google Scholar 

  39. Yang P, Hwa Yang Y, Zhou B, Zomaya A (2010) A review of ensemble methods in bioinformatics. Curr Bioinforma 5:296–308

    CAS  Article  Google Scholar 

Download references

Acknowledgments

The authors thank Centro de Ciências Forenses and Centro de Investigação em Antropologia e Saúde. The authors also thank the anonymous reviewers for their comments and suggestions. The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Navega.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Navega, D., Coelho, C., Vicente, R. et al. AncesTrees: ancestry estimation with randomized decision trees. Int J Legal Med 129, 1145–1153 (2015). https://doi.org/10.1007/s00414-014-1050-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00414-014-1050-9

Keywords

  • Forensic anthropology
  • Ancestry estimation
  • Howells’ craniometric series
  • Random forest
  • AncesTrees