Advertisement

Detecting Cyber Security Threats in Weblogs Using Probabilistic Models

  • Flora S. Tsai
  • Kap Luk Chan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4430)

Abstract

Organizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical infrastructure. Weblogs, or blogs, have also rapidly gained in numbers over the past decade. Weblogs may provide up-to-date information on the prevalence and distribution of various cyber security threats as well as terrorism events. In this paper, we analyze weblog posts for various categories of cyber security threats related to the detection of cyber attacks, cyber crime, and terrorism. Existing studies on intelligence analysis have focused on analyzing news or forums for cyber security incidents, but few have looked at weblogs. We use probabilistic latent semantic analysis to detect keywords from cyber security weblogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in weblog search and keywords detection, and provide an analytical foundation for the future of security intelligence analysis of weblogs.

Keywords

cyber security weblog blog probabilistic latent semantic analysis cyber crime cyber terrorism data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Avesani, P., et al.: Learning Contextualised Weblog Topics. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)Google Scholar
  2. 2.
    Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Columbus, L.: Blog Mining Gets Real. CRM Buyer (2005)Google Scholar
  4. 4.
    Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Diamond, J.: NSA has massive database of Americans’ phone calls. USA Today (May 10, 2006)Google Scholar
  6. 6.
    Gill, K.E.: How Can We Measure the Influence of the Blogosphere? In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)Google Scholar
  7. 7.
    Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)Google Scholar
  8. 8.
    Gruhl, D., et al.: Information Diffusion Through Blogspace. in: WWW ’04 (2004)Google Scholar
  9. 9.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR’99 (1999)Google Scholar
  10. 10.
    Mei, Q., et al.: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In: WWW ’06 (2006)Google Scholar
  11. 11.
    Nakajima, S., et al.: Discovering Important Bloggers based on Analyzing Blog Threads. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)Google Scholar
  12. 12.
    Newman, D., et al.: Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Pikas, C.K.: Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management. Online 29(4), 16–21 (2005)Google Scholar
  14. 14.
    Prabowo, R., Thelwall, M.: A Comparison of Feature Selection Methods for an Evolving RSS Feed Corpus. Information Processing and Management 42, 1491–1512 (2006)CrossRefGoogle Scholar
  15. 15.
    Tsai, F.S., Chan, C.K. (eds.): Cyber Security. Pearson Education, Singapore (2006)Google Scholar
  16. 16.
    Tsai, F.S., Chen, Y., Chan, K.L.: Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs (2007)Google Scholar
  17. 17.
    Wikipedia contributors: Intelligence Analysis. In: Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Intelligence_analysis (accessed Nov. 7, 2006)
  18. 18.
    Yang, C.C., Shi, X., Wei, C.-P.: Tracing the Event Evolution of Terror Attacks from On-Line News. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)Google Scholar
  19. 19.
    Yilmazel, O., et al.: Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content. In: Kantor, P., et al. (eds.) ISI 2005. LNCS, vol. 3495, Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB Toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data: Recent Advances in Clustering, pp. 187–210. Springer, Heidelberg (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Flora S. Tsai
    • 1
  • Kap Luk Chan
    • 1
  1. 1.School of Electrical & Electronic Engineering, Nanyang Technological University, ,639798Singapore

Personalised recommendations