Towards Automated Classification of Firmware Images and Identification of Embedded Devices

  • Andrei CostinEmail author
  • Apostolis Zarras
  • Aurélien Francillon
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 502)


Embedded systems, as opposed to traditional computers, bring an incredible diversity. The number of devices manufactured is constantly increasing and each has a dedicated software, commonly known as firmware. Full firmware images are often delivered as multiple releases, correcting bugs and vulnerabilities, or adding new features. Unfortunately, there is no centralized or standardized firmware distribution mechanism. It is therefore difficult to track which vendor or device a firmware package belongs to, or to identify which firmware version is used in deployed embedded devices. At the same time, discovering devices that run vulnerable firmware packages on public and private networks is crucial to the security of those networks. In this paper, we address these problems with two different, yet complementary approaches: firmware classification and embedded web interface fingerprinting. We use supervised Machine Learning on a database subset of real world firmware files. For this, we first tell apart firmware images from other kind of files and then we classify firmware images per vendor or device type. Next, we fingerprint embedded web interfaces of both physical and emulated devices. This allows recognition of web-enabled devices connected to the network. In some cases, this complementary approach allows to logically link web-enabled online devices with the corresponding firmware package that is running on the devices. Finally, we test the firmware classification approach on 215 images with an accuracy of 93.5%, and the device fingerprinting approach on 31 web interfaces with 89.4% accuracy.


Machine Learn Random Forest Classification Category Random Forest Classifier Embed Device 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The research was partially supported by the German Federal Ministry of Education and Research under grant 16KIS0327 (IUNO).


  1. 1.
    BlindElephant Web-App Fingerprint.
  2. 2.
  3. 3.
    Alvarez, P.: Using Extended File Information (EXIF) file headers in digital evidence analysis. Int. J. Digital Evid. 2(3), 1–5 (2004)Google Scholar
  4. 4.
    Anquetil, N., Lethbridge, T.: Extracting concepts from file names: a new file clustering criterion. In: International Conference on Software Engineering (1998)Google Scholar
  5. 5.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74320-0_10 CrossRefGoogle Scholar
  6. 6.
    Bates, A., Leonard, R., Pruse, H., Lowd, D., Butler, K.: Leveraging USB to establish host identity using commodity devices. In: ISOC Network and Distributed System Security Symposium (NDSS) (2014)Google Scholar
  7. 7.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Machine Learning (2006)Google Scholar
  8. 8.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1), 245–271 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bongard, D.: Fingerprinting web application platforms by variations in PNG implementations. Blackhat (2014)Google Scholar
  10. 10.
    Le Boudec, J.-Y.: Performance Evaluation of Computer and Communication Systems. EPFL Press (2011)Google Scholar
  11. 11.
    Chen, D.D., Egele, M., Woo, M., Brumley, D.: Towards automated dynamic analysis for linux-based embedded firmware. In: ISOC Network and Distributed System Security Symposium (NDSS) (2016)Google Scholar
  12. 12.
    Costin, A., Zaddach, J., Francillon, A., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: USENIX Security Symposium (2014)Google Scholar
  13. 13.
    Costin, A., Zarras, A., Francillon, A.: Automated dynamic firmware analysis at scale: a case study on embedded web interfaces. In: ACM Symposium on Information, Computer and Communications Security (ASIACCS) (2016)Google Scholar
  14. 14.
    Desmond, L.C.C., Yuan, C.C., Pheng, T.C., Lee, R.S.: Identifying unique devices through wireless fingerprinting. In: Wireless Network Security (2008)Google Scholar
  15. 15.
    Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRefGoogle Scholar
  16. 16.
    Intel. Rise of the Embedded Internet (2009)Google Scholar
  17. 17.
    Karakaya, M., Korpeoglu, I., Ulusoy, Ö.: Free riding in peer-to-peer networks. IEEE Internet Comput. 13(2), 92–98 (2009)CrossRefzbMATHGoogle Scholar
  18. 18.
    Klein, L.A., Sensor, D.F.: A Tool for Information Assessment and Decision Making. SPIE Press Bellingham (2004)Google Scholar
  19. 19.
    Kohno, T., Broido, A., Claffy, K.C.: Remote physical device fingerprinting. IEEE Trans. Dependable Secure Comput. 2(2), 93–108 (2005)CrossRefGoogle Scholar
  20. 20.
    Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Invest. 3, 91–97 (2006)CrossRefGoogle Scholar
  21. 21.
    Li, F., Lai, A., Ddl, D.: Evidence of advanced persistent threat: a case study of malware for political espionage. In: International Conference on Malicious and Unwanted Software (MALWARE) (2011)Google Scholar
  22. 22.
    Niemietz, M., Schwenk, J., Network, owning your home: router security revisited. In: Web 2.0 Security and Privacy (W2SP) (2015)Google Scholar
  23. 23.
    Pa, Y.M.P., Suzuki, S., Yoshioka, K., Matsumoto, T., Kasama, T., Rossow, C.: IoTPOT: analysing the rise of IoT compromises. In: USENIX Workshop on Offensive Technologies (WOOT) (2015)Google Scholar
  24. 24.
    Postscapes. Internet of Things Market Forecast (2014)Google Scholar
  25. 25.
    Proofpoint Inc., Home Routers Under Attack via Malvertising on Windows, Android Devices.
  26. 26.
    Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) (2008)Google Scholar
  27. 27.
    Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: European Intelligence and Security Informatics Conference (2012)Google Scholar
  28. 28.
    Samarasinghe, N., Mannan, M., Paper, S.: TLS ecosystems in networked devices vs. web servers. In: International Conference on Financial Cryptography and Data Security (FC) (2017)Google Scholar
  29. 29.
    Shah, S.: HTTP Fingerprinting Advanced Assessment Techniques. Blackhat (2003)Google Scholar
  30. 30.
    Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press (2014)Google Scholar
  31. 31.
    Stringhini, G., Egele, M., Zarras, A., Holz, T., Kruegel, C., Vigna, G.: B@bel: Leveraging Email Delivery for Spam Mitigation. In: USENIX Security (2012)Google Scholar
  32. 32.
    Tian, R., Batten, L.M., Versteeg, S.: Function length as a tool for malware classification. In: Conference on Malicious and Unwanted Software (2008)Google Scholar
  33. 33.
    Wappalyzer. Identify Technology on Websites.
  34. 34.
    Zarras, A., Papadogiannakis, A., Gawlik, R., Holz, T.: Automated generation of models for fast and precise detection of HTTP-based malware. In: International Conference on Privacy, Security and Trust (PST) (2014)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2017

Authors and Affiliations

  • Andrei Costin
    • 1
    Email author
  • Apostolis Zarras
    • 2
  • Aurélien Francillon
    • 3
  1. 1.University of JyväskyläJyväskyläFinland
  2. 2.Technical University of MunichMunichGermany
  3. 3.EURECOMBiotFrance

Personalised recommendations