International Journal of Legal Medicine

, Volume 129, Issue 5, pp 1145–1153 | Cite as

AncesTrees: ancestry estimation with randomized decision trees

  • David NavegaEmail author
  • Catarina Coelho
  • Ricardo Vicente
  • Maria Teresa Ferreira
  • Sofia Wasterlain
  • Eugénia Cunha
Original Article


In forensic anthropology, ancestry estimation is essential in establishing the individual biological profile. The aim of this study is to present a new program—AncesTrees—developed for assessing ancestry based on metric analysis. AncesTrees relies on a machine learning ensemble algorithm, random forest, to classify the human skull. In the ensemble learning paradigm, several models are generated and co-jointly used to arrive at the final decision. The random forest algorithm creates ensembles of decision trees classifiers, a non-linear and non-parametric classification technique. The database used in AncesTrees is composed by 23 craniometric variables from 1,734 individuals, representative of six major ancestral groups and selected from the Howells’ craniometric series. The program was tested in 128 adult crania from the following collections: the African slaves’ skeletal collection of Valle da Gafaria; the Medical School Skull Collection and the Identified Skeletal Collection of 21st Century, both curated at the University of Coimbra. The first step of the test analysis was to perform ancestry estimation including all the ancestral groups of the database. The second stage of our test analysis was to conduct ancestry estimation including only the European and the African ancestral groups. In the first test analysis, 75 % of the individuals of African ancestry and 79.2 % of the individuals of European ancestry were correctly identified. The model involving only African and European ancestral groups had a better performance: 93.8 % of all individuals were correctly classified. The obtained results show that AncesTrees can be a valuable tool in forensic anthropology.


Forensic anthropology Ancestry estimation Howells’ craniometric series Random forest AncesTrees 



The authors thank Centro de Ciências Forenses and Centro de Investigação em Antropologia e Saúde. The authors also thank the anonymous reviewers for their comments and suggestions. The authors declare that they have no conflict of interest.


  1. 1.
    Ousley S, Jantz R, Freid D (2009) Understanding race and human variation: why forensic anthropologists are good at identifying race. Am J Phys Anthropol 139:68–76CrossRefPubMedGoogle Scholar
  2. 2.
    Slice D, Ross AH (2009) 3D-ID: geometric morphometric classification of crania for forensic scientists. Acessed 24 March 2014
  3. 3.
    Hefner JT (2009) Cranial nonmetric variation and estimating ancestry. J Forensic Sci 54:985–995CrossRefPubMedGoogle Scholar
  4. 4.
    Hefner JT, Spradley K, Anderson BE (2011) Ancestry estimation using random forest modelling. Proc. Am. Acad. Forensic Sci. Chicago, IL, pp 352–353Google Scholar
  5. 5.
    Hefner JT, Ousley SD, Dirkmaat DC (2012) Morphoscopic traits and the assessment of ancestry. In: Dirkmaat DC (ed) Companion forensic anthropol, 1st edn. Wiley-Blackwell, West Sussex, pp 287–310CrossRefGoogle Scholar
  6. 6.
    Edgar HJH (2005) Prediction of race using characteristics of dental morphology. J Forensic Sci 50:269–273CrossRefPubMedGoogle Scholar
  7. 7.
    Edgar HJH (2009) Testing the utility of dental morphological traits commonly used in the forensic identification of ancestry. Front Oral Biol 13:49–54CrossRefPubMedGoogle Scholar
  8. 8.
    Edgar HJH (2013) Estimation of ancestry using dental morphological characteristics. J Forensic Sci 58(Suppl 1):S3–S8CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Hefner JT, Spradley MK, Anderson B (2014) Ancestry assessment using random forest modeling. J Forensic Sci 59:583–589CrossRefPubMedGoogle Scholar
  10. 10.
    Hefner JT, Ousley SD (2014) Statistical classification methods for estimating ancestry using morphoscopic traits. J Forensic Sci n/a–n/aGoogle Scholar
  11. 11.
    Giles E, Elliot O (1962) Race identification from cranial measurements. J Forensic Sci 7:147–157Google Scholar
  12. 12.
    Ousley SD, Jantz RL (2005) FORDISC 3.0: Personal computer forensic discriminant functions. Universty of TennesseGoogle Scholar
  13. 13.
    Ousley SD, Jantz RL (2012) ForDisc 3 and statistical methods for sex and ancestry estimation. In: Dirkmaat DC (ed) A Companion to Forensic Anthropology, 1st edn. Wiley-Blackwell, West Sussex, UK, pp 311–329CrossRefGoogle Scholar
  14. 14.
    Wright R (1992) Correlation between cranial form and geography in homo sapiens: CRANID—a computer program for forensic and other applications. Archaeol Ocean 27:128–134CrossRefGoogle Scholar
  15. 15.
    Wright R (2008) Detection of likely ancestry using CRANID. In: Oxenham M (ed) Forensic approaches death, disaster and abuse. Australian Academic Press, Sydney, pp 111–122Google Scholar
  16. 16.
    Du Jardin P, Ponsaillé J, Alunni-Perret V, Quatrehomme G (2009) A comparison between neural network and other metric methods to determine sex from the upper femur in a modern French population. Forensic Sci Int 192:127, e1–6CrossRefPubMedGoogle Scholar
  17. 17.
    Mahfouz M, Badawi A, Merkl B, Fatah EEA, Pritchard E, Kesler K, Moore M, Jantz R, Jantz L (2007) Patella sex determination by 3D statistical shape models and nonlinear classifiers. Forensic Sci Int 173:161–170CrossRefPubMedGoogle Scholar
  18. 18.
    Moss GP, Shah AJ, Adams RG, Davey N, Wilkinson SC, Pugh WJ, Sun Y (2012) The application of discriminant analysis and machine learning methods as tools to identify and classify compounds with potential as transdermal enhancers. Eur J Pharm Sci Off J Eur Fed Pharm Sci 45:116–127Google Scholar
  19. 19.
    Howells WW (1973) Cranial variation in man: a study by multivariate analysis of patterns of difference among recent human populations. Harvard University Press, CambridgeGoogle Scholar
  20. 20.
    Howells WW (1989) Skull shapes and the map: craniometric analyses in the dispersion of modern homo. Peabody Museum of Archaeology and Ethnology, Harvard UniversityGoogle Scholar
  21. 21.
    Howells WW (1995) Who’s who in skulls: ethnic identification of crania from measurements. Peabody Museum of Archaeology and Ethnology, Harvard UniversityGoogle Scholar
  22. 22.
    Howells WW (1996) Howells’ craniometric data on the Internet. Am J Phys Anthropol 101:441–442CrossRefPubMedGoogle Scholar
  23. 23.
    Neves MJ, Almeida M, Ferreira MT (2011) História de um arrabalde durante os séculos XV e XVI: O “poço dos negros” em Lagos (Algarve, Portugal) e o seu contributo para o estudo dos escravos africanos em Portugal. In: Matos AT, Costa JPO (eds) Herança do Infante: História, Arqueologia e Museologia em Lagos. Câmara Municipal de Lagos, Lagos, Portugal, pp 29–46Google Scholar
  24. 24.
    Coelho C (2012) Uma Identidade perdida no mar e reencontrada nos ossos: avaliação das afinidades populacionais de uma amostra de escravos dos séculos XV–XVI. Dissertation, University of CoimbraGoogle Scholar
  25. 25.
    Cunha E, Wasterlain S (2007) The Coimbra identified osteological collections. In: Grupe G, Peters J (eds) Skeletal series and their socio-economic context. Verlag Marie Leidorf, GmbH, Rahden/Westf, Germany, pp 23–33Google Scholar
  26. 26.
    Cunha E (1989) Cálculo de Funções Discriminantes para a Diagnose Sexual do Crânio. Dissertation, University of CoimbraGoogle Scholar
  27. 27.
    Ferreira MT, Navega D, Vicente R, Cunha E (2013) A Colecção de Esqueletos Identificados Século XXI. 12° Congr. Nac. Med. Leg. E Ciênc. ForensesGoogle Scholar
  28. 28.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  29. 29.
    Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157CrossRefGoogle Scholar
  30. 30.
    Dietterich TG (2000) Ensemble methods in machine learning. Mult. Classif. Syst. Springer Berlin Heidelberg, pp 1–15Google Scholar
  31. 31.
    Mitchell TM (1997) Machine learning. McGraw Hill, Burr RidgeGoogle Scholar
  32. 32.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, BerlinCrossRefGoogle Scholar
  33. 33.
    Breiman L (1996) Bagging predictors. Mach Learn 24:123–140Google Scholar
  34. 34.
    Ho TK (1995) Random decision forests. Proc Third Int Conf Doc Anal Recognit 1:278–282CrossRefGoogle Scholar
  35. 35.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844CrossRefGoogle Scholar
  36. 36.
    Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588CrossRefGoogle Scholar
  37. 37.
    Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24:2319–2349CrossRefGoogle Scholar
  38. 38.
    Darroch JN, Mosimann JE (1985) Canonical and principal components of shape. Biometrika 72:241–252CrossRefGoogle Scholar
  39. 39.
    Yang P, Hwa Yang Y, Zhou B, Zomaya A (2010) A review of ensemble methods in bioinformatics. Curr Bioinforma 5:296–308CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • David Navega
    • 1
    • 2
    Email author
  • Catarina Coelho
    • 2
  • Ricardo Vicente
    • 1
    • 2
  • Maria Teresa Ferreira
    • 1
    • 2
    • 3
  • Sofia Wasterlain
    • 1
    • 2
    • 3
  • Eugénia Cunha
    • 1
    • 2
    • 3
  1. 1.Forensic Sciences Centre (CENCIFOR)CoimbraPortugal
  2. 2.Department of Life Sciences, Faculty of Sciences and TechnologyUniversity of CoimbraCoimbraPortugal
  3. 3.Research Centre for Anthropology and Health (CIAS), Department of Life Sciences, Faculty of Sciences and TechnologyUniversity of CoimbraCoimbraPortugal

Personalised recommendations