Personal research idea recommendation using research trends and a hierarchical topic model
In the era of rapid technological advance, it is an important task for all researchers to keep up with trends when performing research. How to efficiently find suitable research topics while the number of papers is increasing rapidly is worthwhile to explore. To solve such problems, some researchers attempted to find research ideas by topic detection and tracking methods. However, these methods do not consider the users’ background knowledge and preferences, and they express a topic with general keywords, which does not effectively help researchers to develop new research ideas. Existing studies support that the title expresses the research idea the best. This study adapts this concept to propose an automatic title generation method that combines personalized recommendation methods and topic trend analysis methods to achieve this task. First, it uses hierarchical latent tree analysis to find the users’ interests for a topic structure and its representative keywords hidden in the existing research. Second, the interesting topic trends, popularity and user preferences in a hybrid recommendation method are considered. Finally, a natural language generation algorithm that is suitable for the titles of academic papers converts the original recommended-keywords into fluent title sentences that are designed for the users. Experiments have found that adding Google Trend indicators and personal factors can improve the performance of topic recommendations. The automatic title generation method using template-based and statistical information methods leads to excellent performances in both grammatical correctness and semantic expression. Moreover, for the users, the title is indeed more inspirational than the simple keywords for users to develop new research ideas.
KeywordsHierarchical topic model Personalized recommendation system Automatic title generation
The research is based on work supported by Taiwan Ministry of Science and Technology under Grant No. MOST 107-2410-H-006 040-MY3 and MOST 108-2511-H-006-009.
- Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, Melbourne, Australia.Google Scholar
- Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Paper presented at the proceedings of the 23rd international conference on machine learning, Pittsburgh, Pennsylvania, USA.Google Scholar
- Boon, S. (2017). 21st Century science overload. Retrieved from http://blog.cdnsciencepub.com/21st-century-science-overload/. Accessed 7 Jan 2017.
- Hofmann, T. (1999). Probabilistic latent semantic analysis. Paper presented at the proceedings of the fifteenth conference on uncertainty in artificial intelligence.Google Scholar
- Howald, B., Kondadadi, R., & Schilder, F. (2013). Domain adaptable semantic clustering in statistical NLG. Paper presented at the proceedings of the 10th international conference on computational semantics (IWCS 2013)—Long papers.Google Scholar
- Lau, J. H., Baldwin, T., & Newman, D. (2013). On collocations and topic models. ACM Transactions on Speech and Language Processing (TSLP),10(3), 10.Google Scholar
- Lopez, C., Prince, V., & Roche, M. (2011). Automatic titling of articles using position and statistical information. Paper presented at the proceedings of the international conference recent advances in natural language processing 2011.Google Scholar
- Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. Paper presented at the proceedings of the 2015 conference on empirical methods in natural language processing.Google Scholar
- Mairesse, F., Gašić, M., Jurčíček, F., Keizer, S., Thomson, B., Yu, K., & Young, S. (2010). Phrase-based statistical language generation using graphical models and active learning. Paper presented at the proceedings of the 48th annual meeting of the association for computational linguistics.Google Scholar
- Salakhutdinov, R., & Mnih, A. (2008). Probabilistic matrix factorization. Paper presented at the proceedings of advances in neural information processing systems 20 (NIPS 07) (pp. 1257–1264). ACM Press.Google Scholar
- Sasaki, A. (2017). Search engine statistics 2017. Retrieved from https://www.airsassociation.org/airs-articles/search-engine-statistics-2017. Accessed on 5 May 2017.
- Stent, A., Marge, M., & Singhai, M. (2005). Evaluating evaluation methods for generation in the presence of variation. Paper presented at the international conference on intelligent text processing and computational linguistics.Google Scholar
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Paper presented at the advances in neural information processing systems 27.Google Scholar
- Wang, H., Wang, N., & Yeung, D. Y. (2015) Collaborative deep learning for recommender systems. Paper presented at the proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, NSW, Australia (pp. 1235–1244).Google Scholar
- Wang, H., Xingjian, S., & Yeung, D. Y. (2016) Collaborative recurrent autoencoder: Recommend while learning to fill in the blanks. Paper presented at the proceedings of the 30th annual conference on neural information processing systems, Barcelona, Spain (pp. 415–423).Google Scholar