RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning

Gharibshah, Joobin; Papalexakis, Evangelos E.; Faloutsos, Michalis

doi:10.1007/978-3-319-93040-4_41

Joobin Gharibshah¹⁹,
Evangelos E. Papalexakis¹⁹ &
Michalis Faloutsos¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3410 Accesses
7 Citations

Abstract

Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx, a systematic approach to identify and label IP addresses in security forums by utilizing a cross-forum learning method. In more detail, the challenge is twofold: (a) identifying IP addresses from other numerical entities, such as software version numbers, and (b) classifying the IP address as benign or malicious. We propose an integrated solution that tackles both these problems. A novelty of our approach is that it does not require training data for each new forum. Our approach does knowledge transfer across forums: we use a classifier from our source forums to identify seed information for training a classifier on the target forum. We evaluate our method using data collected from five security forums with a total of 31 K users and 542 K posts. First, RIPEx can distinguish IP address from other numeric expressions with 95% precision and above 93% recall on average. Second, RIPEx identifies malicious IP addresses with an average precision of 88% and over 78% recall, using our cross-forum learning. Our work is a first step towards harnessing the wealth of useful information that can be found in security forums.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
RIPEx stands for Riverside’s IP Extractor.

References

Abbasi, A., Li, W., Benjamin, V., Hu, S., Chen, H.: Descriptive analytics: examining expert hackers in web forums. In: ISI 2014, pp. 56–63 (2014)
Google Scholar
Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security (2013) CoRR abs/1308.4941
Google Scholar
Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: KDD 2007, pp. 210–219, USA (2007)
Google Scholar
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: ICML 2007, pp. 193–200, New York, NY, USA (2007)
Google Scholar
Daume III, H.: Frustratingly easy domain adaptation. In: ACL 2007 (2007)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005 (2005)
Google Scholar
Frank, R., Macdonald, M., Monk, B.: Location, location, location: mapping potential Canadian targets in online hacker discussion forums. In: EISIC 2016 (2016)
Google Scholar
Gharibshah, J., Li, T.C., Vanrell, M.S., Castro, A., Pelechrinis, K., Papalexakis, E., Faloutsos, M.: InferIp: Extracting actionable information from security discussion forums. In: ASONAM 2017 (2017)
Google Scholar
Holt, T.J., Strumsky, D., Smirnova, O., Kilger, M.: Examining the social networks of malware writers and hackers 6(1), 891–903 (2012)
Google Scholar
Jones, C.L., Bridges, R.A., Huffer, K.M.T., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts (2015) CoRR abs/1504.04317
Google Scholar
Motoyama, M., McCoy, D., Levchenko, K., Savage, S., Voelker, G.M.: An analysis of underground forums. In: IMC 2011, pp. 71–80, New York, NY, USA (2011)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Portnoff, R.S., Afroz, S., Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., McCoy, D., Levchenko, K., Paxson, V.: Tools for automated analysis of cybercriminal markets. In: WWW 2017 (2017)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: ICML 2003 (2003)
Google Scholar
Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums. In: ISI 2015, pp. 31–36 (2015)
Google Scholar
Shakarian, J., Gunn, A.T., Shakarian, P.: Exploring malicious hacker forums. In: Jajodia, S., Subrahmanian, V.S.S., Swarup, V., Wang, C. (eds.) Cyber Deception, pp. 261–284. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32699-3_11
Chapter Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)
Article Google Scholar
Zhang, X., Tsang, A., Yue, W.T., Chau, M.: The classification of hackers by knowledge exchange behaviors. Info. Syst. Front. 17(6), 1239–1251 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California - Riverside, 900 University Ave, Riverside, CA, 92521, USA
Joobin Gharibshah, Evangelos E. Papalexakis & Michalis Faloutsos

Authors

Joobin Gharibshah
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos E. Papalexakis
View author publications
You can also search for this author in PubMed Google Scholar
Michalis Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joobin Gharibshah .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gharibshah, J., Papalexakis, E.E., Faloutsos, M. (2018). RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_41
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics