Skip to main content

Detecting Cyber Security Threats in Weblogs Using Probabilistic Models

  • Conference paper
Intelligence and Security Informatics (PAISI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4430))

Included in the following conference series:

Abstract

Organizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical infrastructure. Weblogs, or blogs, have also rapidly gained in numbers over the past decade. Weblogs may provide up-to-date information on the prevalence and distribution of various cyber security threats as well as terrorism events. In this paper, we analyze weblog posts for various categories of cyber security threats related to the detection of cyber attacks, cyber crime, and terrorism. Existing studies on intelligence analysis have focused on analyzing news or forums for cyber security incidents, but few have looked at weblogs. We use probabilistic latent semantic analysis to detect keywords from cyber security weblogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in weblog search and keywords detection, and provide an analytical foundation for the future of security intelligence analysis of weblogs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Avesani, P., et al.: Learning Contextualised Weblog Topics. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)

    Google Scholar 

  2. Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  3. Columbus, L.: Blog Mining Gets Real. CRM Buyer (2005)

    Google Scholar 

  4. Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Diamond, J.: NSA has massive database of Americans’ phone calls. USA Today (May 10, 2006)

    Google Scholar 

  6. Gill, K.E.: How Can We Measure the Influence of the Blogosphere? In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)

    Google Scholar 

  7. Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)

    Google Scholar 

  8. Gruhl, D., et al.: Information Diffusion Through Blogspace. in: WWW ’04 (2004)

    Google Scholar 

  9. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR’99 (1999)

    Google Scholar 

  10. Mei, Q., et al.: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In: WWW ’06 (2006)

    Google Scholar 

  11. Nakajima, S., et al.: Discovering Important Bloggers based on Analyzing Blog Threads. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)

    Google Scholar 

  12. Newman, D., et al.: Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Pikas, C.K.: Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management. Online 29(4), 16–21 (2005)

    Google Scholar 

  14. Prabowo, R., Thelwall, M.: A Comparison of Feature Selection Methods for an Evolving RSS Feed Corpus. Information Processing and Management 42, 1491–1512 (2006)

    Article  Google Scholar 

  15. Tsai, F.S., Chan, C.K. (eds.): Cyber Security. Pearson Education, Singapore (2006)

    Google Scholar 

  16. Tsai, F.S., Chen, Y., Chan, K.L.: Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs (2007)

    Google Scholar 

  17. Wikipedia contributors: Intelligence Analysis. In: Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Intelligence_analysis (accessed Nov. 7, 2006)

  18. Yang, C.C., Shi, X., Wei, C.-P.: Tracing the Event Evolution of Terror Attacks from On-Line News. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)

    Google Scholar 

  19. Yilmazel, O., et al.: Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content. In: Kantor, P., et al. (eds.) ISI 2005. LNCS, vol. 3495, Springer, Heidelberg (2005)

    Google Scholar 

  20. Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB Toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data: Recent Advances in Clustering, pp. 187–210. Springer, Heidelberg (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christopher C. Yang Daniel Zeng Michael Chau Kuiyu Chang Qing Yang Xueqi Cheng Jue Wang Fei-Yue Wang Hsinchun Chen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Tsai, F.S., Chan, K.L. (2007). Detecting Cyber Security Threats in Weblogs Using Probabilistic Models. In: Yang, C.C., et al. Intelligence and Security Informatics. PAISI 2007. Lecture Notes in Computer Science, vol 4430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71549-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71549-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71548-1

  • Online ISBN: 978-3-540-71549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics