Predicting Micro-Level Behavior in Online Communities for Risk Management

  • Philippa A. HiscockEmail author
  • Athanassios N. Avramidis
  • Jörg Fliege
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Online communities amass vast quantities of valuable knowledge and thus generate major value to their owners. Where these communities are incorporated in a business as the main means of sharing ideas and issues regarding products produced by the business, it is important that the value of this knowledge endures and is easily recognized. For good management of such a business, risk analysis of the integrated online community is required. We choose to focus on the process of knowledge creation rather than the knowledge gained from individual messages isolated from context. Consequently, we model collections of messages, linked via tree-like structures; these message collections we call threads. Here we suggest a risk framework aimed at managing micro-level thread related risks. Specifically, we target the risk that there is no satisfactory response to the original message after a period of time. Risks are considered as binary events; the event can therefore be flagged when it is predicted to occur for the attention of the community manager. To predict such a binary response, we use several methods, including a Bayesian probit regression estimated via Gibbs sampling; results indicate this model to be suitable for classification tasks such as those considered.


Linear Discriminant Analysis True Positive Rate Online Community Classification Prediction Thread Creation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Dr. Adrian Mocan of SAP for his contribution to defining the risk events. We also thank Edwin Tye, School of Mathematics, University of Southampton, for his assistance in processing the data. This work has been supported by the EU FP7 project ROBUST, EC Project Number 257859.


  1. Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.CrossRefzbMATHMathSciNetGoogle Scholar
  2. Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2012). Discovering value from community activity on focused question answering sites: A case study of stack overflow. In KDD (pp. 850–858). New York, NY: ACM Press.Google Scholar
  3. Berrar, D., & Flach, P. A. (2012). Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Briefings in Bioinformatics, 13, 83–97.CrossRefGoogle Scholar
  4. Cappé, O., Moulines, E., & Rydén, T. (2005). Inference in hidden Markov models. Springer series in statistics. New York: Springer.Google Scholar
  5. Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.MathSciNetGoogle Scholar
  6. Chen, M. H., & Shao, Q. M. (1999). Properties of prior and posterior distributions for multivariate categorical response data models. Journal of Multivariate Analysis, 71(2), 277–296.CrossRefzbMATHMathSciNetGoogle Scholar
  7. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.CrossRefMathSciNetGoogle Scholar
  8. Flach, P. A. (2010): ROC analysis. In C. Sammut, & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 869–875). Boston, MA: Springer.Google Scholar
  9. Hand, D. J. (2006). Classifier technology and the illusion of progress (with discussion). Statistical Science, 21(1), 1–34.CrossRefzbMATHMathSciNetGoogle Scholar
  10. Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123.CrossRefGoogle Scholar
  11. Hastie, T., Tibshirani, R., & Friedman, J. (2011). The Elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.Google Scholar
  12. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall/CRC.CrossRefzbMATHGoogle Scholar
  13. Robert, C. P., & Casella, G. (2004). Monte carlo statistical methods (2nd ed.). New York: Springer.CrossRefzbMATHGoogle Scholar
  14. R Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
  15. Sing, T., Sander, O., Beerenwinkel, N., & Lengauer, T. (2005). ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20), 3940–3941.
  16. Venables, W., & Ripley, B. (2002). Modern applied statistics with S (4th ed.). New York: Springer.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Philippa A. Hiscock
    • 1
    Email author
  • Athanassios N. Avramidis
    • 1
  • Jörg Fliege
    • 1
  1. 1.University of SouthamptonSouthamptonUK

Personalised recommendations