Predicting Micro-Level Behavior in Online Communities for Risk Management
Online communities amass vast quantities of valuable knowledge and thus generate major value to their owners. Where these communities are incorporated in a business as the main means of sharing ideas and issues regarding products produced by the business, it is important that the value of this knowledge endures and is easily recognized. For good management of such a business, risk analysis of the integrated online community is required. We choose to focus on the process of knowledge creation rather than the knowledge gained from individual messages isolated from context. Consequently, we model collections of messages, linked via tree-like structures; these message collections we call threads. Here we suggest a risk framework aimed at managing micro-level thread related risks. Specifically, we target the risk that there is no satisfactory response to the original message after a period of time. Risks are considered as binary events; the event can therefore be flagged when it is predicted to occur for the attention of the community manager. To predict such a binary response, we use several methods, including a Bayesian probit regression estimated via Gibbs sampling; results indicate this model to be suitable for classification tasks such as those considered.
KeywordsLinear Discriminant Analysis True Positive Rate Online Community Classification Prediction Thread Creation
We thank Dr. Adrian Mocan of SAP for his contribution to defining the risk events. We also thank Edwin Tye, School of Mathematics, University of Southampton, for his assistance in processing the data. This work has been supported by the EU FP7 project ROBUST, EC Project Number 257859.
- Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2012). Discovering value from community activity on focused question answering sites: A case study of stack overflow. In KDD (pp. 850–858). New York, NY: ACM Press.Google Scholar
- Cappé, O., Moulines, E., & Rydén, T. (2005). Inference in hidden Markov models. Springer series in statistics. New York: Springer.Google Scholar
- Flach, P. A. (2010): ROC analysis. In C. Sammut, & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 869–875). Boston, MA: Springer.Google Scholar
- Hastie, T., Tibshirani, R., & Friedman, J. (2011). The Elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.Google Scholar
- R Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/.
- Sing, T., Sander, O., Beerenwinkel, N., & Lengauer, T. (2005). ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20), 3940–3941. http://rocr.bioinf.mpi-sb.mpg.de.