Abstract
Organizations and governments are becoming vulnerable to a wide variety of security breaches against their information infrastructure. The magnitude of this threat is evident from the increasing rate of cyber attacks against computers and critical infrastructure. Weblogs, or blogs, have also rapidly gained in numbers over the past decade. Weblogs may provide up-to-date information on the prevalence and distribution of various cyber security threats as well as terrorism events. In this paper, we analyze weblog posts for various categories of cyber security threats related to the detection of cyber attacks, cyber crime, and terrorism. Existing studies on intelligence analysis have focused on analyzing news or forums for cyber security incidents, but few have looked at weblogs. We use probabilistic latent semantic analysis to detect keywords from cyber security weblogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in weblog search and keywords detection, and provide an analytical foundation for the future of security intelligence analysis of weblogs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Avesani, P., et al.: Learning Contextualised Weblog Topics. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Berry, M., Dumais, S., O’Brien, G.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Columbus, L.: Blog Mining Gets Real. CRM Buyer (2005)
Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Diamond, J.: NSA has massive database of Americans’ phone calls. USA Today (May 10, 2006)
Gill, K.E.: How Can We Measure the Influence of the Blogosphere? In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: WWW ’04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Gruhl, D., et al.: Information Diffusion Through Blogspace. in: WWW ’04 (2004)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR’99 (1999)
Mei, Q., et al.: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In: WWW ’06 (2006)
Nakajima, S., et al.: Discovering Important Bloggers based on Analyzing Blog Threads. In: WWW ’05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Newman, D., et al.: Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)
Pikas, C.K.: Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management. Online 29(4), 16–21 (2005)
Prabowo, R., Thelwall, M.: A Comparison of Feature Selection Methods for an Evolving RSS Feed Corpus. Information Processing and Management 42, 1491–1512 (2006)
Tsai, F.S., Chan, C.K. (eds.): Cyber Security. Pearson Education, Singapore (2006)
Tsai, F.S., Chen, Y., Chan, K.L.: Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs (2007)
Wikipedia contributors: Intelligence Analysis. In: Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Intelligence_analysis (accessed Nov. 7, 2006)
Yang, C.C., Shi, X., Wei, C.-P.: Tracing the Event Evolution of Terror Attacks from On-Line News. In: Mehrotra, S., et al. (eds.) ISI 2006. LNCS, vol. 3975, Springer, Heidelberg (2006)
Yilmazel, O., et al.: Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content. In: Kantor, P., et al. (eds.) ISI 2005. LNCS, vol. 3495, Springer, Heidelberg (2005)
Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB Toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data: Recent Advances in Clustering, pp. 187–210. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Tsai, F.S., Chan, K.L. (2007). Detecting Cyber Security Threats in Weblogs Using Probabilistic Models. In: Yang, C.C., et al. Intelligence and Security Informatics. PAISI 2007. Lecture Notes in Computer Science, vol 4430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71549-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-71549-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71548-1
Online ISBN: 978-3-540-71549-8
eBook Packages: Computer ScienceComputer Science (R0)