Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine Learning

  • Antonio J. Tallón-Ballesteros
  • José C. Riquelme
Part of the Studies in Computational Intelligence book series (SCI, volume 555)


Digital forensics research includes several stages. Once we have collected the data the last goal is to obtain a model in order to predict the output with unseen data. We focus on supervised machine learning techniques. This chapter performs an experimental study on a forensics data task for multi-class classification including several types of methods such as decision trees, bayes classifiers, based on rules, artificial neural networks and based on nearest neighbors. The classifiers have been evaluated with two performance measures: accuracy and Cohen’s kappa. The followed experimental design has been a 4-fold cross validation with thirty repetitions for non-deterministic algorithms in order to obtain reliable results, averaging the results from 120 runs. A statistical analysis has been conducted in order to compare each pair of algorithms by means of t-tests using both the accuracy and Cohen’s kappa metrics.


Digital forensics Glass evidence Data mining Supervised machine learning Classification model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Caddy, B.: Forensic Examination of Glass and Paint: Analysis and Interpretation. Taylor & Francis, London (2011)Google Scholar
  2. 2.
    Mumford, C.L., Jain, L.C. (eds.): Computational Intelligence. ISRL, vol. 1. Springer, Heidelberg (2009)MATHGoogle Scholar
  3. 3.
    Popescu, A.C., Farid, H.: Statistical Tools for Digital Forensics. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 128–147. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Kessler, G.C.: Advancing the Science of Digital Forensics. Computer 45(12), 25–27 (2012)CrossRefGoogle Scholar
  5. 5.
    Stuart, B.H.: Forensic Analytical Techniques. John Wiley & Sons, West Sussex (2013)Google Scholar
  6. 6.
    Curran, J.M., Hicks, T.N., Buckleton, J.S.: Forensic Interpretation of Glass Evidence. CRC Press, Boca Raton (2000)Google Scholar
  7. 7.
    Newton, A.W.N., Kitto, L., Buckleton, J.S.: A study of the performance and utility of annealing in forensic glass analysis. Forensic Science International 155, 119–125 (2005)CrossRefGoogle Scholar
  8. 8.
    Winstanley, R., Rydeard, C.: Concepts of annealing applied to small glass fragments. Forensic Science International 29, 1–10 (1985)CrossRefGoogle Scholar
  9. 9.
    Terry, K.W., van Riessen, A., Lynch, B.F., Vowles, D.J.: Quantitative analysis of glasses used within Australia. Forensic Science International 25, 19–34 (1984)CrossRefGoogle Scholar
  10. 10.
    Zadora, G.: Classification of Glass Fragments Based on Elemental Composition and Refractive Index. Journal of Forensic Science 54(1), 49–59 (2009)CrossRefGoogle Scholar
  11. 11.
    Ahmad, U.K., Asmuje, N.F., Ibrahim, R., Kamaruzamanc, N.U.: Forensic Classification of Glass Employing Refractive Index Measurement. Malaysian Journal of Forensic Sciences 3(1), 1–4 (2012)Google Scholar
  12. 12.
    Zadora, G., Brozek-Mucha, Z., Parczewski, A.: A classification of glass microtraces. Problems of Forensic Sciences XLVII, 137–143 (2001)Google Scholar
  13. 13.
    Grainger, M.N.C., Manley-Harris, M., Coulson, S.: Classification and discrimination of automotive glass using LA-ICP-MS. Journal of Analytical Atomic Spectrometry 27, 1413–1422 (2012)CrossRefGoogle Scholar
  14. 14.
    Uzkent, B., Barkana, B.D., Cevikalp, H.: Non-speech environmental sound classification using SVMs with a new set of features. International Journal of Innovative Computing, Information and Control 8(5B), 3511–3524 (2012)Google Scholar
  15. 15.
    Bottrell, M.C.: Forensic Glass Comparison: Background Information Used in Data Interpretation. Forensic Science Communications 11(2) (2009)Google Scholar
  16. 16.
    Koons, R.D., Buscaglia, J., Bottrell, M., Miller, E.T.: Forensic glass comparisons. In: Saferstein, R. (ed.) Forensic Science Handbook, 2nd edn., vol. I, pp. 161–213. Prentice Hall, Upper Saddle River (2002)Google Scholar
  17. 17.
    Evett, I.W., Spiehler, E.J.: Rule induction in forensic science. In: Knowledge Based Systems in Government, pp. 152–160. Halsted Press, London (1988)Google Scholar
  18. 18.
    Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010),
  19. 19.
    Buscema, M.: Artificial Adaptive Systems in Data Visualization: Proactive Data. In: Buscema, M., Tastle, W. (eds.) Intelligent Data Mining in Law Enforcement Analytics: New Neural Networks Applied to Real Problems, pp. 51–88 (2013)Google Scholar
  20. 20.
    Parvin, H., Minaei-Bidgoli, B., Shahpar, H.: Classifier Selection by Clustering. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds.) MCPR 2011. LNCS, vol. 6718, pp. 60–66. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Murty, M.N., Devi, V.S.: Pattern Recognition. An Algorithmic Approach. Universities Press (India), Pvt. Ltd., London (2011)MATHGoogle Scholar
  22. 22.
    Dougherty, G.: Pattern Recognition and Classification: An Introduction. Springer, New York (2013)CrossRefGoogle Scholar
  23. 23.
    Murthy, S.K.: Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)CrossRefGoogle Scholar
  24. 24.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  25. 25.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth Int. Group, Belmont (1984)MATHGoogle Scholar
  26. 26.
    Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Francisco (1998)Google Scholar
  27. 27.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Waltham (2011)Google Scholar
  28. 28.
    Cohen, W.: Fast effective rule induction. In: Proc. of the 12th Int. ICML Conf., pp. 115–123 (1995)Google Scholar
  29. 29.
    Michie, D., Spiegelhalter, D.J.: Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)MATHGoogle Scholar
  30. 30.
    Haykin, S.O.: Neural Networks and Learning Machines. Prentice Hall, Upper Saddle River (2009)Google Scholar
  31. 31.
    Bishop, M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)Google Scholar
  32. 32.
    Howlett, R.J., Jain, L.C.: Radial Basis Function Networks 1: Recent Developments in Theory and Applications. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  33. 33.
    Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: consistency properties. Tech. Rep. 4, USAF School of Aviation Medicine, Randolph Field, Texas (1951)Google Scholar
  34. 34.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)CrossRefMATHGoogle Scholar
  35. 35.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Boston (2005)Google Scholar
  36. 36.
    Boularias, A., Chaib-draa, B.: Apprenticeship learning with few examples. Neurocomputing 104, 83–96 (2013)CrossRefGoogle Scholar
  37. 37.
    Bargiela, A., Pedrycz, W.: A model of granular data: a design problem with the Tchebyschev FCM. Soft Computing 9(3), 155–163 (2005)CrossRefMATHGoogle Scholar
  38. 38.
    Hjorth, J.S.U.: Computer intensive statistical methods: Validation model selection and bootstrap. Chapman and Hall, London (1994)MATHGoogle Scholar
  39. 39.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), Montreal, Quebec, Canada, vol. 2, pp. 1137–1145 (1995)Google Scholar
  40. 40.
    Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, United Kingdom (2012)CrossRefGoogle Scholar
  41. 41.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA (2011)Google Scholar
  42. 42.
    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  43. 43.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  44. 44.
    Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing 114, 107–117 (2013)CrossRefGoogle Scholar
  45. 45.
    Nisbet, R., Elder, J.F., Miner, G.: Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Canada (2009)MATHGoogle Scholar
  46. 46.
    Silva, J.A., Hruschka, E.R.: An experimental study on the use of nearest neighbor-based imputation algorithms for classification tasks. Data & Knowledge Engineering 84, 47–58 (2013)CrossRefGoogle Scholar
  47. 47.
    Wang, Y., Cao, F., Yuan, Y.: A study on effectiveness of extreme learning machine. Neurocomputing 74, 2483–2490 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Antonio J. Tallón-Ballesteros
    • 1
  • José C. Riquelme
    • 1
  1. 1.Department of Languages and Computer SystemsUniversity of SevilleSevilleSpain

Personalised recommendations