Abstract
With the proliferation of blogs, or weblogs, in the recent years, information in the blogosphere is becoming increasingly difficult to access and retrieve. Previous studies have focused on analyzing personal blogs, but few have looked at corporate blogs, the numbers of which are dramatically rising. In this paper, we use probabilistic techniques to detect keywords from corporate blogs with respect to certain topics. We then demonstrate how this method can present the blogosphere in terms of topics with measurable keywords, hence tracking popular conversations and topics in the blogosphere. By applying a probabilistic approach, we can improve information retrieval in blog search and keywords detection, and provide an analytical foundation for the future of corporate blog search and mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, C., Mayfield, R.: Fortune 500 Business Blogging Wiki (2006), available at: http://socialtext.net/bizblogs
Avesani, P., Cova, M., Hayes, C., Massa, P.: Learning Contextualised Weblog Topics. WWW 2005 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Cass, J., Munroe, K., Turcotter, S.: Corporate blogging: is it worth the hype? (2005), available at: http://www.backbonemedia.com/blogsurvey/blogsurvey2005.pdf
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Dowling, W.G., Daniels, D.: Corporate Weblogs: Deployment, Promotion, and Measurement. The JupiterResearch Concept Report (2006)
Farahat, A., Chen, F.: Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis. In: EACL 2006 (2006)
Gill, K.E.: How Can We Measure the Influence of the Blogosphere? WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2004)
Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information Diffusion Through Blogspace. In: WWW 2004 (2004)
Lee, S., Hwang, T., Lee, H.-H.: Corporate blogging strategies of the Fortune 500 companies. Management Decision 44(3) (2006)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR 1999 (1999)
Mei, Q., Liu, C., Su, H., Zhai, C.: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In: WWW 2006 (2006)
Mishne, G., de Rijke, M.: A Study of Blog Search. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)
Nakajima, S., Tatemura, J., Hino, Y., Hara, Y., Tanaka, K.: Discovering Important Bloggers based on Analyzing Blog Threads. In: WWW 2005 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (2005)
Pikas, C.K.: Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management. Online 29(4), 16–21 (2005)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Prabowo, R., Thelwall, M.: A Comparison of Feature Selection Methods for an Evolving RSS Feed Corpus. Information Processing and Management 42, 1491–1512 (2006)
Scoble, R., Israel, S.: Naked Conversations: How Blogs Are Changing the Way Businesses Talk with Customers. John Wiley & Sons, Chichester (2006)
Sifry, D.: Sifry’s Alerts: State of the Blogosphere (2006), Available at: http://www.sifry.com/alerts/archives/000443.html
Tsai, F.S., Chan, K.L.: Detecting Cyber Security Threats in Weblogs Using Probabilistic Models. In: Yang, C.C., et al. (eds.) LNCS, vol. 4430, pp. 46–57. Springer, Heidelberg (2007)
Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB Toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data: Recent Advances in Clustering, pp. 187–210. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsai, F.S., Chen, Y., Chan, K.L. (2007). Probabilistic Techniques for Corporate Blog Mining. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)