Generating Competitive Intelligence Digests with a LDA-Based Method: A Case of BT Intellact

  • Qiang WeiEmail author
  • Jiaqi Wang
  • Guoqing Chen
  • Xunhua Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9751)


Internet has transformed the ways that organizations gather, produce and transmit competitive intelligence (CI), especially in the age of big data. This paper introduces a competitive intelligence digest generation method based on LDA topic modelling and representative text extraction. With the incorporated metric of perplexity, the proposed method is capable of automatic grouping of the texts and generating CI digests in an appropriate number of topics. Moreover, the method is applied to the context of BT Plc in the form of a case study, demonstrating its effectiveness in practical use.


Competitive intelligence LDA-based Topic generation Representative documents extraction 



The work was partly supported by the National Natural Science Foundation of China (71490724/71110107027/71372044) and the Tsinghua-BT Advanced ICT Lab at Tsinghua University. The authors highly appreciate the support and cooperation of BT and Dr. Quan Li at BT China Research Centre for the work.


  1. 1.
    Teo, T.S., Choo, W.Y.: Assessing the impact of using the Internet for competitive intelligence. Inf. Manag. 39(1), 67–83 (2001)CrossRefGoogle Scholar
  2. 2.
    Zhe, G., Dong, L., Qi, L., et al.: An online hot topics detection approach using the improved ant colony text clustering algorithm. J. JCIT 2, 243–252 (2012)Google Scholar
  3. 3.
    Sathiyakumari, K., Manimekalai, G., Preamsudha, V., et al.: A survey on various approaches in document clustering. Int. J. Comput. Technol. Appl. (IJCTA) 2(5), 1534–1539 (2011)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Sahoo, N., Callan, J., Krishnan, R., et al.: Incremental hierarchical clustering of text documents. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 357–366. ACM (2006)Google Scholar
  6. 6.
    Young, S., Arel, I., Karnowski, T.P., et al.: A fast and stable incremental clustering algorithm. In: 2010 Seventh International Conference on Information Technology: New Generations (ITNG), pp. 204–209. IEEE (2010)Google Scholar
  7. 7.
    Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: KDD, pp. 9–15 (1998)Google Scholar
  8. 8.
    Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. ACM SIGKDD Explor. Newsl. 2(1), 51–57 (2000)CrossRefGoogle Scholar
  9. 9.
    O’callaghan, L., Meyerson, A., Motwani, R., et al.: Streaming-data algorithms for high-quality clustering. In: ICDE, p. 0685. IEEE (2002)Google Scholar
  10. 10.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM (2001)Google Scholar
  11. 11.
    Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB Endowment, vol. 30, pp. 180–191 (2004)Google Scholar
  12. 12.
    Zhong, S.: Efficient streaming text clustering. Neural Netw. 18(5), 790–798 (2005)CrossRefzbMATHGoogle Scholar
  13. 13.
    Banerjee, A., Basu, S.: Topic models over text streams: a study of batch and online unsupervised learning. In: SDM 7, pp. 437–442 (2007)Google Scholar
  14. 14.
    Maskeri, G., Sarkar, S., Heafield, K.: Mining business topics in source code using Latent Dirichlet Allocation. In: Proceedings of the 1st India Software Engineering Conference, pp. 113–120. ACM (2008)Google Scholar
  15. 15.
    Canini, K.R., Shi, L., Griffiths, T.L.: Online inference of topics with Latent Dirichlet Allocation. In: International Conference on Artificial Intelligence and Statistics, pp. 65–72 (2009)Google Scholar
  16. 16.
    Bíró, I., Siklósi, D., Szabó, J., et al.: Linked Latent Dirichlet Allocation in web spam filtering. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 37–40. ACM (2009)Google Scholar
  17. 17.
    Blei, D., Hoffman, M.: Online learning for Latent Dirichlet Allocation. In: Neural Information Processing Systems (2010)Google Scholar
  18. 18.
    Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Qiang Wei
    • 1
    Email author
  • Jiaqi Wang
    • 1
  • Guoqing Chen
    • 1
  • Xunhua Guo
    • 1
  1. 1.School of Economics and ManagementTsinghua UniversityBeijingChina

Personalised recommendations