Skip to main content

RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Abstract

Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx, a systematic approach to identify and label IP addresses in security forums by utilizing a cross-forum learning method. In more detail, the challenge is twofold: (a) identifying IP addresses from other numerical entities, such as software version numbers, and (b) classifying the IP address as benign or malicious. We propose an integrated solution that tackles both these problems. A novelty of our approach is that it does not require training data for each new forum. Our approach does knowledge transfer across forums: we use a classifier from our source forums to identify seed information for training a classifier on the target forum. We evaluate our method using data collected from five security forums with a total of 31 K users and 542 K posts. First, RIPEx can distinguish IP address from other numeric expressions with 95% precision and above 93% recall on average. Second, RIPEx identifies malicious IP addresses with an average precision of 88% and over 78% recall, using our cross-forum learning. Our work is a first step towards harnessing the wealth of useful information that can be found in security forums.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    RIPEx stands for Riverside’s IP Extractor.

References

  1. Abbasi, A., Li, W., Benjamin, V., Hu, S., Chen, H.: Descriptive analytics: examining expert hackers in web forums. In: ISI 2014, pp. 56–63 (2014)

    Google Scholar 

  2. Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security (2013) CoRR abs/1308.4941

    Google Scholar 

  3. Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: KDD 2007, pp. 210–219, USA (2007)

    Google Scholar 

  4. Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: ICML 2007, pp. 193–200, New York, NY, USA (2007)

    Google Scholar 

  5. Daume III, H.: Frustratingly easy domain adaptation. In: ACL 2007 (2007)

    Google Scholar 

  6. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005 (2005)

    Google Scholar 

  7. Frank, R., Macdonald, M., Monk, B.: Location, location, location: mapping potential Canadian targets in online hacker discussion forums. In: EISIC 2016 (2016)

    Google Scholar 

  8. Gharibshah, J., Li, T.C., Vanrell, M.S., Castro, A., Pelechrinis, K., Papalexakis, E., Faloutsos, M.: InferIp: Extracting actionable information from security discussion forums. In: ASONAM 2017 (2017)

    Google Scholar 

  9. Holt, T.J., Strumsky, D., Smirnova, O., Kilger, M.: Examining the social networks of malware writers and hackers 6(1), 891–903 (2012)

    Google Scholar 

  10. Jones, C.L., Bridges, R.A., Huffer, K.M.T., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts (2015) CoRR abs/1504.04317

    Google Scholar 

  11. Motoyama, M., McCoy, D., Levchenko, K., Savage, S., Voelker, G.M.: An analysis of underground forums. In: IMC 2011, pp. 71–80, New York, NY, USA (2011)

    Google Scholar 

  12. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  13. Portnoff, R.S., Afroz, S., Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., McCoy, D., Levchenko, K., Paxson, V.: Tools for automated analysis of cybercriminal markets. In: WWW 2017 (2017)

    Google Scholar 

  14. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML 2003 (2003)

    Google Scholar 

  15. Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums. In: ISI 2015, pp. 31–36 (2015)

    Google Scholar 

  16. Shakarian, J., Gunn, A.T., Shakarian, P.: Exploring malicious hacker forums. In: Jajodia, S., Subrahmanian, V.S.S., Swarup, V., Wang, C. (eds.) Cyber Deception, pp. 261–284. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32699-3_11

    Chapter  Google Scholar 

  17. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)

    Article  Google Scholar 

  18. Zhang, X., Tsang, A., Yue, W.T., Chau, M.: The classification of hackers by knowledge exchange behaviors. Info. Syst. Front. 17(6), 1239–1251 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joobin Gharibshah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gharibshah, J., Papalexakis, E.E., Faloutsos, M. (2018). RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics