Oscar — File Type Identification of Binary Data in Disk Clusters and RAM Pages

  • Martin Karresand
  • Nahid Shahmehri
Part of the IFIP International Federation for Information Processing book series (IFIPAICT, volume 201)


This paper proposes a method, called Oscar, for determining the probable file type of binary data fragments. The Oscar method is based on building models, called centroids, of the mean and standard deviation of the byte frequency distribution of different file types. A weighted quadratic distance metric is then used to measure the distance between the centroid and sample data fragments. If the distance falls below a threshold, the sample is categorized as probably belonging to the modelled file type. Oscar is tested using JPEG pictures and is shown to give a high categorization accuracy, i.e. high detection rate and low false positives rate. By using a practical example we demonstrate how to use the Oscar method to prove the existence of known pictures based on fragments of them found in RAM and the swap partition of a computer.


Hard Disk Intrusion Detection Compression Rate Data Part Child Pornography 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Wang, K., Stolfo, S.: Anomalous payload-based network intrusion detection. In E. Jonsson el al., ed.: Recent Advances in Intrusion Detection 2004. Volume 3224 of LNCS., Springer-Verlag (2004) 203–222Google Scholar
  2. 2.
    CONVAR Deutschland: Pc inspector. ( accessed 2005-10-31.Google Scholar
  3. 3.
    Carrier, B.: The Sleuth Kit. ( accessed 2005-10-25.Google Scholar
  4. 4.
    Farmer, D., Venema, W.: The Coroner’s Toolkit (TCT). ( accessed 2005-10-25.Google Scholar
  5. 5.
    Guidance Software: Encase forensic. ( accessed 2005-10-31.Google Scholar
  6. 6.
    QueTek Consulting Corporation: File scavenger. ( accessed 2005-10-31.Google Scholar
  7. 7.
    iolo technologies: Search and recover. (http://www.iolo.eom/sr/3/) accessed 2005-10-31.Google Scholar
  8. 8.
    Brand, N.: Frozentech’s livecd list. ( accessed 2005-10-28.Google Scholar
  9. 9.
    grugq: Defeating forensic analysis on unix. Phrack 11(59) (2002), last visited 2004-11-19.Google Scholar
  10. 10.
    grugq: Remote exec. Phrack 11(62) (2004), last visited 2004-11-19.Google Scholar
  11. 11.
    Pluf, Ripe: Advanced antiforensics: SELF. Phrack 11(63) (2005), accessed 2005-11-03.Google Scholar
  12. 12.
    Rhodin, S.: Forensic engineer, Swedish National Laboratory of Forensic Science (SKL), IT Group. (several telephone contacts during October and November 2005)Google Scholar
  13. 13.
    Ericson, P.: Detective Sergeant, National Criminal Investigation Department (RKP), IT Crime Squad, IT Forensic Group. (telephone interview 2005-10-31)Google Scholar
  14. 14.
    Damashek, M.: Gauging similarity with n-grams: Language-independent categorization of text. Science 267(5199) (1995) 843–848Google Scholar
  15. 15.
    Eaton, J.: Octave. ( Scholar
  16. 16.
    Li, W.J., Wang, K., Stolfo, S., Herzog, B.: Fileprints: Identifying file types by n-gram analysis. In: Proceedings from the sixth IEEE Sytems, Man and Cybernetics Information Assurance Workshop. (2005) 64–71Google Scholar
  17. 17.
    McDaniel, M., Heydari, M.: Content based file type detection algorithms. In: HICSS’03: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS’03)-Track 9, Washington, DC, USA, IEEE Computer Society (2003) 332.1Google Scholar
  18. 18.
    Darwin, I.: file(l). ( html) accessed 2005-10-25.Google Scholar

Copyright information

© International Federation for Information Processing 2006

Authors and Affiliations

  • Martin Karresand
    • 1
    • 2
  • Nahid Shahmehri
    • 1
  1. 1.Department of Computer and Information ScienceLinköpings universitetLinköpingSweden
  2. 2.Department of Systems Development and IT-securitySwedish Defence Research AgencyLinköpingSweden

Personalised recommendations