Research on hot news discovery model based on user interest and topic discovery
- 133 Downloads
In order to find a way to process network data and discover hot news based on user’s interest and topic, through research on hot news discovery algorithm, a double-layer text clustering model based on density clustering strategy and Single-pass strategy is proposed. In view of the huge network data characteristics, DBSCAN algorithm is firstly used to cluster the single-crawled network data into small-scale clusters. Then, the Single-pass strategy is used to perform incremental clustering on the micro-classes to create the topic classes. In the hot news part of the network, the media and the user’s attention to the topic is combined to design a model. The heat quantization formula is obtained. Based on the research of related technologies, a network hot topic detection model is designed and implemented by using web crawler, news discovery and hot news discovery technology. By comparing the two-layer model used in the model with the traditional Single-pass strategy, the feasibility of the two-layer model is verified, and the discovery of network hot news is realized.
KeywordsWeb crawler Text clustering Interest and topic discovery Hot news discovery
The authors acknowledge the Natural Science Research Plan in Educational Commission of Shaanxi Province of China (Program No. 2013JK1141).
- 1.Kausar, M.A., Dhaka, S., Singh, S.K.: Web crawler: a review. Int. J. Comput. Appl. 63(2), 23–35 (2013)Google Scholar
- 3.Bouarara, H.A., Hamou, R.M., Amine, A.: Text clustering using distances combination by social bees: towards 3D visualisation aspect. Int. J. Inf. Retr. Res. 4(3), 34–53 (2014)Google Scholar
- 10.Hu, Y.: Clustering-based hot topic detecting in chinese microblog. TELKOMNIKA Indones. J. Electr. Eng. 12(3), 2096–2103 (2014)Google Scholar