, Volume 116, Issue 2, pp 1273–1301 | Cite as

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

  • Luca Cagliero
  • Paolo Garza
  • Mohammad Reza Kavoosifar
  • Elena Baralis


Identifying the most relevant scientific publications on a given topic is a well-known research problem. The Author-Topic Model (ATM) is a generative model that represents the relationships between research topics and publication authors. It allows us to identify the most influential authors on a particular topic. However, since most research works are co-authored by many researchers the information provided by ATM can be complemented by the study of the most fruitful collaborations among multiple authors. This paper addresses the discovery of research collaborations among multiple authors on single or multiple topics. Specifically, it exploits an exploratory data mining technique, i.e., weighted association rule mining, to analyze publication data and to discover correlations between ATM topics and combinations of authors. The mined rules characterize groups of researchers with fairly high scientific productivity by indicating (1) the research topics covered by their most cited publications and the relevance of their scientific production separately for each topic, (2) the nature of the collaboration (topic-specific or cross-topic), (3) the name of the external authors who have (occasionally) collaborated with the group either on a specific topic or on multiple topics, and (4) the underlying correlations between the addressed topics. The applicability of the proposed approach was validated on real data acquired from the Online Mendelian Inheritance in Man catalog of genetic disorders and from the PubMed digital library. The results confirm the effectiveness of the proposed strategy.


Author Topic Model Weighted association rule mining Data mining Knowledge discovery 


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th VLDB conference, pp. 487–499.Google Scholar
  2. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD, 1993, pp. 207–216.Google Scholar
  3. Baralis, E., Cagliero, L., Cerquitelli, T., & Garza, P. (2012). Generalized association rule mining with constraints. Information Sciences, 194, 68–84. Scholar
  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.
  5. Brin, S., & Page, L. (1998) The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998).
  6. Cagliero, L., & Garza, P. (2014). Infrequent weighted itemset mining using frequent pattern growth. IEEE Transactions on Knowledge and Data Engineering, 26(4), 903–915. Scholar
  7. Cagliero, L., Garza, P., Kavoosifar, M. R., & Baralis, E. (2017). Identifying collaborations among researchers: A pattern-based approach. In Proceedings of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL 2017) co-located with the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2017), Tokyo, Japan, August 11, 2017, pp. 56–68.
  8. Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. JASIST, 65, 1820–1833.Google Scholar
  9. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’99, pp. 43–52.
  10. Hamosh, A., Scott, A., Amberger, J., Valle, D., & McKusick, V. (2000). Online mendelian inheritance in man (OMIM). Human Mutation, 15(1), 57–61.;2-G.CrossRefGoogle Scholar
  11. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD’00, Dallas, TX.Google Scholar
  12. Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754. Scholar
  13. Kim, H. J., An, J., Jeong, Y. K., & Song, M. (2016). Exploring the leading authors and journals in major topics by citation sentences and topic modeling. In BIRNDL@JCDL.Google Scholar
  14. Kou, N. M., Hou, U. L., Mamoulis, N., & Gong, Z. (2015a). Weighted coverage based reviewer assignment. In Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’15, pp. 2031–2046.
  15. Kou, N. M., U, L. H., Mamoulis, N., Li, Y., Li, Y., & Gong, Z. (2015b). A topic-based reviewer assignment system. Proceedings of the VLDB Endowment, 8(12), 1852–1855. Scholar
  16. Li, B., & Hou, Y. T. (2016). The new automated IEEE INFOCOM review assignment system. IEEE Network, 30(5), 18–24. Scholar
  17. Liu, B., Hsu, W., Chen, S., & Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems and Their Applications, 15(5), 47–55. Scholar
  18. Loper, E., & Bird, S. (2002). NLTK: The natural language toolkit. In Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, ETMTNLP ’02, pp. 63–70.
  19. Lu, C., Zhang, C., & Ma, S. (2015). How does citing behavior for a scientific article change over time? A preliminary study. In Proceedings of the 78th ASIS&T annual meeting: Information science with impact: Research in and for the Community. American Society for Information Science, Silver Springs, MD, USA, ASIST ’15, pp. 97:1–97:4.
  20. Mutschke, P. (2003). Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks (pp. 155–166). Berlin: Springer. Scholar
  21. NCBI. (2017). National Center for Biotechnology Information Website. Available at Last Access: May 2017.
  22. Newman, M. E. J. (2001). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64, 016131.CrossRefGoogle Scholar
  23. Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors and documents. CoRR arxiv:abs/1207.4169.
  24. Silverstein, C., Brin, S., & Motwani, R. (1998). Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1), 39–68. Scholar
  25. Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp. 306–315.
  26. Sun, K., & Bai, F. (2008). Mining weighted association rules without preassigned weights. IEEE Transactions on Knowledge and Data Engineering, 20(4), 489–495.CrossRefGoogle Scholar
  27. Tan, P. N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’02, pp. 32–41.
  28. Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.Google Scholar
  29. Tang, J., Zhang, J., Yao, L., Li, J. Z., Zhang, L., & Su, Z. (2008) Arnetminer: Extraction and mining of academic social networks. In KDD Google Scholar
  30. Tao, F., Murtagh, F., & Farid, M. (2003). Weighted association rule mining using weighted support and significance framework. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03, pp. 661–666.Google Scholar
  31. Waltman, L., & van Eck, N. J. (2015). Field-normalized citation impact indicators and the choice of an appropriate counting method. Journal of Informetrics, 9(4), 872–894. Scholar
  32. Wang, J., Han, J., & Pei, J. (2003). Closet+: Searching for the best strategies for mining frequent closed itemsets. In L. Getoor, T.E. Senator, P. Domingos, C. Faloutsos (Eds.), Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 236–245.Google Scholar
  33. Wang, W., Yang, J., & Yu, P. S. (2000). Efficient mining of weighted association rules (WAR). In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’00, pp. 270–274.Google Scholar
  34. White, S., & Smyth, P. (2003). Algorithms for estimating relative importance in networks. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’03, pp. 266–275.
  35. Zhang, G., Ding, Y., & Milojevic, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. JASIST, 64, 1490–1503.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.Dipartimento di Automatica e InformaticaPolitecnico di TorinoTorinoItaly

Personalised recommendations