Incorporating Language Identification in Digital Forensics Investigation Framework

  • Nicholas Akosu
  • Ali Selamat
Part of the Studies in Computational Intelligence book series (SCI, volume 555)


In current business practices, majority of organizations rely heavily on digital devices such as computers, generic media, cell phones, network systems, and the internet to operate and improve their business. Thus, a large amount of information is produced, accumulated, and distributed via electronic means. Consequently, government and company interests in cyberspace and private networks become vulnerable to cyberspace threats. The investigation of crimes involving the use of digital devices is classified under digital forensics which involves adoption of practical frameworks and methods to recover data for analysis which can serve as evidence in court. However, cybercrime has advanced to the stage where criminals try to cover their tracks through the use of anti-forensic strategies such as data overwriting and data hiding. Research into anti-forensics has given rise to the concept of ‘live’ forensics which comprises proactive forensics approaches capable of digitally investigating an incident as it occurs. However, information exchange using ICT facilities has reduced the world into a global village without eliminating the linguistic diversity on the planet. Moreover, existing digital forensics frameworks have assumed the language of stored information. If such assumption turns out to be wrong, semantic interpretation of extracted text would also be wrong leading to wrong conclusions. We propose incorporation of language identification (LID) in digital forensics investigation (DFI) models in order to help law enforcement to be a step ahead of criminals. In this chapter, we outline issues of language identification in DFI frameworks and propose a new framework with language identification component. The LID component is to carry out digital surveillance by scrutinizing emails, SMS, and text file transfers, in and out of the system of interest. The collected text is then subjected to language identification. Determining the language of the text would help to decide if the communication is regular and safe or suspicious and should be subjected to further forensic analysis. Finally we discuss results from a simple language identification scheme that can be easily and quickly integrated to a DFI model yielding very high accuracy without compromising speed performance.


Digital forensic framework Anti-forensics Language identification Under-resourced languages Spelling checker 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nikkel, B.J.: The Role of Digital Forensics within a Corporate Organization. In: Proceedings of the IBSA Conference, Vienna (2006)Google Scholar
  2. 2.
    Gordon, R.G.: Ethnologue: Languages of the world. SIL International, Dallas (2005)Google Scholar
  3. 3.
    Brown, R.D.: Finding and Identifying Text in 900+ Languages. Digital Investigation (2012)Google Scholar
  4. 4.
    Hammarstr-om, H.: A Fine-Grained Model for Language Identification. In: Workshop of Improving Non English Web Searching. Proceedings of iNEWS 2007 Workshop at SIGIR, pp. 14–20 (2007)Google Scholar
  5. 5.
    Carrier, B., Spafford, E.: Getting physical with the digital investigation process. International Journal of Digital Evidence 2(2) (2003)Google Scholar
  6. 6.
    Palmer, G.: A Roadmap for Digital Forensic Research. DFRWS Technical Report (2001),
  7. 7.
    Roussev, V., Wang, L., Richard, G., Marziale, L.: A Cloud Computing Platform for Large-Scale Forensic Computing. In: Peterson, G., Shenoi, S. (eds.) Advances in Digital Forensics V. IFIP AICT, vol. 306, pp. 201–214. Springer, Heidelberg (2009)Google Scholar
  8. 8.
    Ruan, K., Baggili, I., Carthy, J., Kechadi, T.: Cloud Forensics: An Overview (2012),
  9. 9.
    Kohn, M., Eloff, J., Olivier, M.: Framework for a Digital Forensic Investigation. In: Proceedings of the Information Security South Africa (ISSA), from Insight to Foresight Conference, Sandton, pp. 1–7 (2006)Google Scholar
  10. 10.
    Agarwal, M.A., Gupta, M.M., Gupta, M.S., Gupta, S.C.: Systematic Digital Forensic Investigation Model. International Journal of Computer Science and Security (IJCSS) 5(1) (2011)Google Scholar
  11. 11.
    Roussev, V., Richard. III., G.G.: Breaking the Performance Wall: The Case for Distributed Digital Forensics. In: Proceedings of the 2004 Digital Forensics Research Workshop, Baltimore, MD (2004)Google Scholar
  12. 12.
    Garfinkel, S.L.: Digital Forensics Research: The Next 10 Years. Digital Investigation (2010)Google Scholar
  13. 13.
    Garfinkel, S.L.: Anti-Forensics: Techniques, Detection and Counter Measures. In: Proceedings of the 2nd International Conference on i-Warfare and Security, p. 77 (2007)Google Scholar
  14. 14.
    Alharbi, S., Weber-Jahnke, J., Traore, I.: The Proactive and Reactive Digital Forensics Investigation Process: A Systematic Literature Review. International Journal of Security and Its Applications 5(4) (2011)Google Scholar
  15. 15.
    Carrier, B., Spafford, E.: An Event-Based Digital Forensic Investigation Framework. In: Proceedings of the Fourth Annual Digital Forensic Research Workshop, Baltimore, MD (2004)Google Scholar
  16. 16.
    Grobler, C.P., Louwrens, C.P., Solms, S.H.: A Multi-component View of Digital Forensics. In: ARES 2010 International Conference on Availability, Reliability, and Security, pp. 647–652 (2010)Google Scholar
  17. 17.
    UNESCO, Office of the High commissioner for Human Rights 1948, Universal Declaration of Human Rights., (accessed on August 19, 2011)
  18. 18.
    Prinsloo, D., Schryver, G.M.: Non-Word Error Detection in Current South African Spellcheckers. Southern African Linguistic and Applied Language Studies 21(4) (2003)Google Scholar
  19. 19.
    Veken, A.V., Schryver, G.M.: Les langues africaines sur la Toile. Etudie des cas Haoussa, Somali, Lingala et isi-xhosa (Title set into English: Non-Word Error Detection in Current South African Spellcheckers). In: Cahiers du Rifal 23 (Theme: Le traitement informatique des langues Africaines), pp. 33–45 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Software Engineering Department, Faculty of ComputingUniversiti Tecknologi MalaysiaSkudaiMalaysia

Personalised recommendations