Abstract
Online encyclopedias such as Wikipedia provide a large and growing number of articles on many topics. However, the content of many articles is still far from complete. In this paper, we propose EncyCatalogRec, a system to help generate a more comprehensive article by recommending catalogs. First, we represent articles and catalog items as embedding vectors, and obtain similar articles via the locality sensitive hashing technology, where the items of these articles are considered as the candidate items. Then a relation graph is built from the articles and the candidate items. This is further transformed into a product graph. So, the recommendation problem is changed to a transductive learning problem in the product graph. Finally, the recommended items are sorted by the learning-to-rank technology. Experimental results demonstrate that our approach achieves state-of-the-art performance on catalog recommendation in both warm- and cold-start scenarios. We have validated our approach by a case study.
Similar content being viewed by others
References
Banerjee S, Mitra P, 2015a. Filling the gaps: improving Wikipedia stubs. Proc ACM Symp on Document Engineering, p.117–120. https://doi.org/10.1145/2682571.2797073
Banerjee S, Mitra P, 2015b. WikiKreator: improving Wikipedia stubs automatically. Proc 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th Int Joint Conf on Natural Language Processing, p.867–877. https://doi.org/10.3115/v1/P15-1084
Banerjee S, Mitra P, 2016. WikiWrite: generating Wikipedia articles automatically. Proc 25th Int Joint Conf on Artificial Intelligence, p.2740–2746.
Bizer C, Lehmann J, Kobilarov G, et al., 2009. DBpedia—a crystallization point for the web of data. J Web Semant, 7(3): 154–165. https://doi.org/10.1016/j.websem.2009.07.002
Datar M, Immorlica N, Indyk P, et al., 2004. Locality-sensitive hashing scheme based on p-stable distributions. Proc 20th Annual Symp on Computational Geometry, p.253–262. https://doi.org/10.1145/997817.997857
Fetahu B, Markert K, Anand A, 2015. Automated news suggestions for populating Wikipedia entity pages. Proc 24th ACM Int Conf on Information and Knowledge Management, p.323–332. https://doi.org/10.1145/2806416.2806531
Gambhir M, Gupta V, 2017. Recent automatic text summarization techniques: a survey. Artif Intell Rev, 47(1): 1–66. https://doi.org/10.1007/s10462-016-9475-9
Haveliwala TH, 2002. Topic-sensitive PageRank. Proc 11th Int Conf on World Wide Web, p.517–526. https://doi.org/10.1145/511446.511513
He XN, Liao LZ, Zhang HW, et al., 2017. Neural collaborative filtering. Proc 26th Int Conf on World Wide Web, p.173–182. https://doi.org/10.1145/3038912.3052569
Hoffart J, Suchanek FM, Berberich K, et al., 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intell, 194: 28–61. https://doi.org/10.10167/j.artint.2012.06.001
Joachims T, 2002. Optimizing search engines using click-through data. Proc 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.133–142. https://doi.org/10.1145/775047.775067
Joachims T, 2006. Training linear SVMs in linear time. Proc 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.217–226. https://doi.org/10.1145/1150402.1150429
Koren Y, Bell R, Volinsky C, 2009. Matrix factorization techniques for recommender systems. Computer, 42(8): 30–37. https://doi.org/10.1109/MC.2009.263
Le QV, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.1188–1196.
Liu HX, Yang YM, 2015. Bipartite edge prediction via transductive learning over product graphs. Proc 32nd Int Conf on Machine Learning, p.1880–1888.
Luo X, Zhou MC, Xia YN, et al., 2014. An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans Ind Inform, 10(2): 1273–1284. https://doi.org/10.1109/TII.2014.2308433
Mikolov T, Sutskever I, Chen K, et al., 2013a. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111–3119.
Mikolov T, Chen K, Corrado G, et al., 2013b. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
Reinanda R, Meij E, de Rijke M, 2015. Mining, ranking and recommending entity aspects. Proc 38th Int ACM SI-GIR Conf on Research and Development in Information Retrieval, p.263–272. https://doi.org/10.1145/2766462.2767724
Sauper C, Barzilay R, 2009. Automatically generating Wikipedia articles: a structure-aware approach. Proc 47th Annual Meeting of the ACL and the 4th Int Joint Conf on Natural Language Processing of the AFNLP, p.208–216.
Strube M, Ponzetto SP, 2006. WikiRelate! Computing semantic relatedness using Wikipedia. Proc 21st National Conf on Artificial Intelligence, p.1419–1424.
Suchanek FM, Kasneci G, Weikum G, 2007. YAGO: a core of semantic knowledge. Proc 16th Int Conf on World Wide Web, p.697–706. https://doi.org/10.1145/1242572.1242667
Tanaka S, Okazaki N, Ishizuka M, 2010. Learning web query patterns for imitating Wikipedia articles. Proc 23rd Int Conf on Computational Linguistics, p.1229–1237.
Wagstaff KL, Riloff E, Lanza NL, et al., 2016. Creating a Mars target encyclopedia by extracting information from the planetary science literature. AAAI Workshop on Knowledge Extraction from Text, p.532–536.
Wulczyn E, West R, Zia L, et al., 2016. Growing Wikipedia across languages via recommendation. Proc 25th Int Conf on World Wide Web, p.975–985. https://doi.org/10.1145/2872427.2883077
Zhao Y, Karypis G, 2002. Evaluation of hierarchical clustering algorithms for document datasets. Proc 11th Int Conf on Information and Knowledge Management, p.515–524. https://doi.org/10.1145/584792.584877
Zhao Y, Karypis G, Fayyad U, 2005. Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov, 10(2): 141–168. https://doi.org/10.1007/s10618-005-0361-3
Author information
Authors and Affiliations
Corresponding author
Additional information
Compliance with ethics guidelines
Wei-ming LU, Jia-hui LIU, Wei XU, Peng WANG, and Bao-gang WEI declare that they have no conflict of interest.
Deceased
Project supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY17F020015), the Fundamental Research Funds for the Central Universities, China (No. 2017FZA5016), the Chinese Knowledge Center of Engineering Science and Technology (CKCEST), and the MOE Engineering Research Center of Digital Library
Rights and permissions
About this article
Cite this article
Lu, Wm., Liu, Jh., Xu, W. et al. EncyCatalogRec: catalog recommendation for encyclopedia article completion. Front Inform Technol Electron Eng 21, 436–447 (2020). https://doi.org/10.1631/FITEE.1800363
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1800363