Advertisement

RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning

  • Joobin Gharibshah
  • Evangelos E. Papalexakis
  • Michalis Faloutsos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10939)

Abstract

Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx, a systematic approach to identify and label IP addresses in security forums by utilizing a cross-forum learning method. In more detail, the challenge is twofold: (a) identifying IP addresses from other numerical entities, such as software version numbers, and (b) classifying the IP address as benign or malicious. We propose an integrated solution that tackles both these problems. A novelty of our approach is that it does not require training data for each new forum. Our approach does knowledge transfer across forums: we use a classifier from our source forums to identify seed information for training a classifier on the target forum. We evaluate our method using data collected from five security forums with a total of 31 K users and 542 K posts. First, RIPEx can distinguish IP address from other numeric expressions with 95% precision and above 93% recall on average. Second, RIPEx identifies malicious IP addresses with an average precision of 88% and over 78% recall, using our cross-forum learning. Our work is a first step towards harnessing the wealth of useful information that can be found in security forums.

Keywords

Security Online communities mining 

References

  1. 1.
    Abbasi, A., Li, W., Benjamin, V., Hu, S., Chen, H.: Descriptive analytics: examining expert hackers in web forums. In: ISI 2014, pp. 56–63 (2014)Google Scholar
  2. 2.
    Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security (2013) CoRR abs/1308.4941Google Scholar
  3. 3.
    Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: KDD 2007, pp. 210–219, USA (2007)Google Scholar
  4. 4.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: ICML 2007, pp. 193–200, New York, NY, USA (2007)Google Scholar
  5. 5.
    Daume III, H.: Frustratingly easy domain adaptation. In: ACL 2007 (2007)Google Scholar
  6. 6.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005 (2005)Google Scholar
  7. 7.
    Frank, R., Macdonald, M., Monk, B.: Location, location, location: mapping potential Canadian targets in online hacker discussion forums. In: EISIC 2016 (2016)Google Scholar
  8. 8.
    Gharibshah, J., Li, T.C., Vanrell, M.S., Castro, A., Pelechrinis, K., Papalexakis, E., Faloutsos, M.: InferIp: Extracting actionable information from security discussion forums. In: ASONAM 2017 (2017)Google Scholar
  9. 9.
    Holt, T.J., Strumsky, D., Smirnova, O., Kilger, M.: Examining the social networks of malware writers and hackers 6(1), 891–903 (2012)Google Scholar
  10. 10.
    Jones, C.L., Bridges, R.A., Huffer, K.M.T., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts (2015) CoRR abs/1504.04317Google Scholar
  11. 11.
    Motoyama, M., McCoy, D., Levchenko, K., Savage, S., Voelker, G.M.: An analysis of underground forums. In: IMC 2011, pp. 71–80, New York, NY, USA (2011)Google Scholar
  12. 12.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  13. 13.
    Portnoff, R.S., Afroz, S., Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., McCoy, D., Levchenko, K., Paxson, V.: Tools for automated analysis of cybercriminal markets. In: WWW 2017 (2017)Google Scholar
  14. 14.
    Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML 2003 (2003)Google Scholar
  15. 15.
    Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums. In: ISI 2015, pp. 31–36 (2015)Google Scholar
  16. 16.
    Shakarian, J., Gunn, A.T., Shakarian, P.: Exploring malicious hacker forums. In: Jajodia, S., Subrahmanian, V.S.S., Swarup, V., Wang, C. (eds.) Cyber Deception, pp. 261–284. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32699-3_11CrossRefGoogle Scholar
  17. 17.
    Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)CrossRefGoogle Scholar
  18. 18.
    Zhang, X., Tsang, A., Yue, W.T., Chau, M.: The classification of hackers by knowledge exchange behaviors. Info. Syst. Front. 17(6), 1239–1251 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Joobin Gharibshah
    • 1
  • Evangelos E. Papalexakis
    • 1
  • Michalis Faloutsos
    • 1
  1. 1.Department of Computer ScienceUniversity of California - RiversideRiversideUSA

Personalised recommendations