Mining Actionable Information from Security Forums: The Case of Malicious IP Addresses

  • Joobin GharibshahEmail author
  • Tai Ching Li
  • Andre Castro
  • Konstantinos Pelechrinis
  • Evangelos E. Papalexakis
  • Michalis Faloutsos
Part of the Lecture Notes in Social Networks book series (LNSN)


The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of identifying malicious IP addresses, among the IP addresses which are reported in the forums. We develop a method to automate the identification of malicious IP addresses with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. A key design feature of our technique is that it can be readily applied to different language forums, since it does not require a sophisticated natural language processing approach. In particular, our solution only needs a small number of keywords in the new language plus the user’s behavior captured by specific features. We also develop a tool to automate the data collection from security forums. Using our tool, we collect approximately 600K posts from three different forums. Our method exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three forums. We argue that our method can provide significantly more information: we find up to three times more potentially malicious IP address compared to the reference blacklist VirusTotal. As the cyber-wars are becoming more intense, having early accesses to useful information becomes more imperative to remove the hackers first-move advantage, and our work is a solid step towards this direction.


Security Online communities mining Forums 



This material is based upon work supported by an Adobe Data Science Research Faculty Award, and DHS ST Cyber Security (DDoSD) HSHQDC-14-R-B00017 grant. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding institutions.


  1. 1.
    Abbasi, A., Li, W., Benjamin, V., Hu, S., Chen, H.: Descriptive analytics: examining expert hackers in web forums. In: 2014 IEEE Joint Intelligence and Security Informatics Conference, pp. 56–63. IEEE, Piscataway (2014)Google Scholar
  2. 2.
    Althoff, T., Jindal, P., Leskovec, J.: Online actions with offline impact: how online social networks influence online and offline user behavior. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM’17), pp. 537–546. ACM, New York (2017)Google Scholar
  3. 3.
  4. 4.
    Blanco, C., Lasheras, J., Valencia-García, R., Fernández-Medina, E., Toval, A., Piattini, M.: A systematic review and comparison of security ontologies. In: 2008 Third International Conference on Availability, Reliability and Security, pp. 813–820. IEEE, Piscataway (2008)Google Scholar
  5. 5.
    Bridges, R.A., Jones, C.L., Iannacone, M.D., Testa, K.M., Goodall, J.R.: Automatic labeling for entity extraction in cyber security. arXiv preprint arXiv:1308.4941 (2013)Google Scholar
  6. 6.
    Cheng, J., Bernstein, M., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anyone can become a troll: causes of trolling behavior in online discussions. In: Proceedings of the Conference on Computer-Supported Cooperative Work. Conference on Computer-Supported Cooperative Work, p. 1217. NIH Public Access (2017)Google Scholar
  7. 7.
    Devineni, P., Koutra, D., Faloutsos, M., Faloutsos, C.: If walls could talk: patterns and anomalies in Facebook wallposts. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (ASONAM’15), pp. 367–374. ACM, New York (2015)Google Scholar
  8. 8.
    Frank, R., Macdonald, M., Monk, B.: Location, location, location: mapping potential Canadian targets in online hacker discussion forums. In: 2016 European Intelligence and Security Informatics Conference (EISIC), pp. 16–23. IEEE, Piscataway, (2016)Google Scholar
  9. 9.
  10. 10.
    Hang, H., Bashir, A., Faloutsos, M., Faloutsos, C. and Dumitras, T.: “Infect-me-not”: a user-centric and site-centric study of web-based malware. In: IFIP Networking Conference (IFIP Networking) and Workshops, pp. 234–242. IEEE, Piscataway (2016)Google Scholar
  11. 11.
    Iannacone, M., Bohn, S., Nakamura, G., Gerth, J., Huffer, K., Bridges, R., Ferragut, E., Goodall, J. Developing an ontology for cyber security knowledge graphs. In: Proceedings of the 10th Annual Cyber and Information Security Research Conference (CISR’15), pp. 12:1–12:4. ACM, New York (2015)Google Scholar
  12. 12.
    Li, T.C., Gharibshah, J., Papalexakis, E.E., Faloutsos, M.: Trollspot: detecting misbehavior in commenting platforms. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’17), pp. 171–175. ACM, New York (2017)Google Scholar
  13. 13.
    Motoyama, M., McCoy, D., Levchenko, K., Savage, S., Voelker, G.M.: An analysis of underground forums. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC’11), pp. 71–80. ACM, New York (2011)Google Scholar
  14. 14.
  15. 15.
    Offensive Community.
  16. 16.
    Papalexakis, E.E., Sidiropoulos, N.D., Bro, R.: From k-means to higher-way co-clustering: multilinear decomposition with sparse latent factors. IEEE Trans. Signal Process. 61(2), 493–506 (2013)CrossRefGoogle Scholar
  17. 17.
    Portnoff, R.S., Afroz, S., Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., McCoy, D., Levchenko, K., Paxson, V.: Tools for automated analysis of cybercriminal markets. In: Proceedings of the 26th International Conference on World Wide Web, pp. 657–666. International World Wide Web Conferences Steering CommitteeGoogle Scholar
  18. 18.
    Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)Google Scholar
  19. 19.
    Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums. In: IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 31–36. IEEE, Piscataway (2015)Google Scholar
  20. 20.
    Ugander, J., Karrer, B., Backstrom, L., Marlow, C.: The anatomy of the Facebook social graph. arXiv preprint arXiv:1111.4503 (2011)Google Scholar
  21. 21.
  22. 22.
  23. 23.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), pp. 412–420. Morgan Kaufmann Publishers, San Francisco (1997)Google Scholar
  24. 24.
    Zhang, X., Tsang, A., Yue, W.T., Chau, M.: The classification of hackers by knowledge exchange behaviors. Inf. Syst. Front. 17(6), 1239–1251 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Joobin Gharibshah
    • 1
    Email author
  • Tai Ching Li
    • 1
  • Andre Castro
    • 1
  • Konstantinos Pelechrinis
    • 2
  • Evangelos E. Papalexakis
    • 1
  • Michalis Faloutsos
    • 1
  1. 1.University of California RiversideRiversideUSA
  2. 2.School of Information SciencesUniversity of PittsburghPittsburghUSA

Personalised recommendations