Mining Outlier Participants: Insights Using Directional Distributions in Latent Models

  • Didi Surian
  • Sanjay Chawla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8190)

Abstract

In this paper we will propose a new probabilistic topic model to score the expertise of participants on the projects that they contribute to based on their previous experience. Based on each participant’s score, we rank participants and define those who have the lowest scores as outlier participants. Since the focus of our study is on outliers, we name the model as Mining Outlier Participants from Projects (MOPP) model. MOPP is a topic model that is based on directional distributions which are particularly suitable for outlier detection in high-dimensional spaces. Extensive experiments on both synthetic and real data sets have shown that MOPP gives better results on both topic modeling and outlier detection tasks.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Computing Surveys 41 (2009)Google Scholar
  2. 2.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research (JMLR) 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Mardia, K.: Statistical of directional data (with discussion). Journal of the Royal Statistical Society 37(3), 390 (1975)MathSciNetGoogle Scholar
  4. 4.
    Mardia, K., Jupp, P.: Directional Statistics. John Wiley and Sons, Ltd. (2000)Google Scholar
  5. 5.
    Fisher, N., Lewis, T., Embleton, B.: Statistical Analysis of Spherical Data. Cambridge University Press (1987)Google Scholar
  6. 6.
    Zhong, S., Ghosh, J.: Generative model-based document clustering: A comparative study. Knowledge and Information Systems 8(3), 374–384 (2005)CrossRefGoogle Scholar
  7. 7.
    Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research (JMLR) 6, 1345–1382 (2005)MathSciNetMATHGoogle Scholar
  8. 8.
    Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Generative model-based clustering of directional data. In: ACM Conference on Knowledge Discovery and Data Mining, SIGKDD (2003)Google Scholar
  9. 9.
    Banerjee, A., Ghosh, J.: Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres. In: Proceedings International Joint Conference on Neural Networks, vol. 15, pp. 1590–1595 (2002)Google Scholar
  10. 10.
    Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)CrossRefMATHGoogle Scholar
  11. 11.
    Reisinger, J., Waters, A., Silverthorn, B., Mooney, R.J.: Spherical topic models. In: International Conference on Machine Learning, ICML (2010)Google Scholar
  12. 12.
    Ide, T., Kashima, H.: Eigenspace-based anomaly detection in computer systems. In: ACM Conference on Knowledge Discovery and Data Mining, SIGKDD (2004)Google Scholar
  13. 13.
    Fujimaki, R., Yairi, T., Machida, K.: An approach to spacecraft anomaly detection problem using kernel feature space. In: ACM Conference on Knowledge Discovery and Data Mining, SIGKDD (2005)Google Scholar
  14. 14.
    Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: ACM Conference on Knowledge Discovery and Data Mining, SIGKDD (2008)Google Scholar
  15. 15.
    Griffith, T., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)CrossRefGoogle Scholar
  16. 16.
    Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: Identifying density-based local outliers. In: IEEE International Conference on Data Mining, ICDM (2000)Google Scholar
  17. 17.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research (JMLR) 3, 583–617 (2002)MathSciNetGoogle Scholar
  18. 18.
    Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowledge and Information Systems 8, 374–384 (2005)CrossRefGoogle Scholar
  19. 19.
    Hu, X., Zhang, X., Lu, C., Park, E., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: ACM Conference on Knowledge Discovery and Data Mining, SIGKDD (2009)Google Scholar
  20. 20.
    DeGroot, M.H.: Probability and Statistics, 2nd edn. Addison-Wesley (1986)Google Scholar
  21. 21.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Didi Surian
    • 1
  • Sanjay Chawla
    • 2
  1. 1.University of SydneyAustralia
  2. 2.NICTAAustralia

Personalised recommendations