Empirical Software Engineering

, Volume 18, Issue 6, pp 1195–1237

Software Bertillonage

Determining the provenance of software development artifacts
  • Julius Davies
  • Daniel M. German
  • Michael W. Godfrey
  • Abram Hindle

DOI: 10.1007/s10664-012-9199-7

Cite this article as:
Davies, J., German, D.M., Godfrey, M.W. et al. Empir Software Eng (2013) 18: 1195. doi:10.1007/s10664-012-9199-7


Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components—such as external libraries or cloned source code—is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying the source origin of binary libraries within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 275 GB collection of open source Java libraries. To show the approach is both valid and effective, we conducted an empirical study on 945 jars from the Debian GNU/Linux distribution, as well as an industrial case study on 81 jars from an e-commerce application.


Reuse Provenance Code evolution Code fingerprints 

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Julius Davies
    • 1
  • Daniel M. German
    • 2
  • Michael W. Godfrey
    • 3
  • Abram Hindle
    • 4
  1. 1.Department of Computer ScienceUniversity of British ColumbiaVancouverCanada
  2. 2.Department of Computer ScienceUniversity of VictoriaVictoriaCanada
  3. 3.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  4. 4.Department of Computing SciencesUniversity of AlbertaEdmontonCanada

Personalised recommendations