Detecting Cyber Security Threats in Weblogs Using Probabilistic Models

Tsai, Flora S.; Chan, Kap Luk

doi:10.1007/978-3-540-71549-8_4

Flora S. Tsai¹ &
Kap Luk Chan¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4430))

Included in the following conference series:

Pacific-Asia Workshop on Intelligence and Security Informatics

2213 Accesses
30 Citations

Abstract

Organizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical infrastructure. Weblogs, or blogs, have also rapidly gained in numbers over the past decade. Weblogs may provide up-to-date information on the prevalence and distribution of various cyber security threats as well as terrorism events. In this paper, we analyze weblog posts for various categories of cyber security threats related to the detection of cyber attacks, cyber crime, and terrorism. Existing studies on intelligence analysis have focused on analyzing news or forums for cyber security incidents, but few have looked at weblogs. We use probabilistic latent semantic analysis to detect keywords from cyber security weblogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in weblog search and keywords detection, and provide an analytical foundation for the future of security intelligence analysis of weblogs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Avesani, P., et al.: Learning Contextualised Weblog Topics. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Google Scholar
Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Article MATH MathSciNet Google Scholar
Columbus, L.: Blog Mining Gets Real. CRM Buyer (2005)
Google Scholar
Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Diamond, J.: NSA has massive database of Americans’ phone calls. USA Today (May 10, 2006)
Google Scholar
Gill, K.E.: How Can We Measure the Influence of the Blogosphere? In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Google Scholar
Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Google Scholar
Gruhl, D., et al.: Information Diffusion Through Blogspace. in: WWW ’04 (2004)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR’99 (1999)
Google Scholar
Mei, Q., et al.: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In: WWW ’06 (2006)
Google Scholar
Nakajima, S., et al.: Discovering Important Bloggers based on Analyzing Blog Threads. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Google Scholar
Newman, D., et al.: Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)
Chapter Google Scholar
Pikas, C.K.: Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management. Online 29(4), 16–21 (2005)
Google Scholar
Prabowo, R., Thelwall, M.: A Comparison of Feature Selection Methods for an Evolving RSS Feed Corpus. Information Processing and Management 42, 1491–1512 (2006)
Article Google Scholar
Tsai, F.S., Chan, C.K. (eds.): Cyber Security. Pearson Education, Singapore (2006)
Google Scholar
Tsai, F.S., Chen, Y., Chan, K.L.: Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs (2007)
Google Scholar
Wikipedia contributors: Intelligence Analysis. In: Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Intelligence_analysis (accessed Nov. 7, 2006)
Yang, C.C., Shi, X., Wei, C.-P.: Tracing the Event Evolution of Terror Attacks from On-Line News. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)
Google Scholar
Yilmazel, O., et al.: Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content. In: Kantor, P., et al. (eds.) ISI 2005. LNCS, vol. 3495, Springer, Heidelberg (2005)
Google Scholar
Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB Toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data: Recent Advances in Clustering, pp. 187–210. Springer, Heidelberg (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical & Electronic Engineering, Nanyang Technological University, ,639798, Singapore
Flora S. Tsai & Kap Luk Chan

Authors

Flora S. Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Kap Luk Chan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christopher C. Yang Daniel Zeng Michael Chau Kuiyu Chang Qing Yang Xueqi Cheng Jue Wang Fei-Yue Wang Hsinchun Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsai, F.S., Chan, K.L. (2007). Detecting Cyber Security Threats in Weblogs Using Probabilistic Models. In: Yang, C.C., et al. Intelligence and Security Informatics. PAISI 2007. Lecture Notes in Computer Science, vol 4430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71549-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-71549-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71548-1
Online ISBN: 978-3-540-71549-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics