Abstract
As the Smart city trend especially artificial intelligence, data science, and the internet of things has attracted lots of attention, many researchers have created various smart applications for improving people’s life quality. As it is very essential to automatically collect and exploit information in the era of industry 4.0, a variety of models have been proposed for storage problem solving and efficient data mining. In this paper, we present our proposed system, Trendy Keyword Extraction System (TKES), which is designed for extracting trendy keywords from text streams. The system also supports storing, analyzing, and visualizing documents coming from text streams. The system first automatically collects daily articles, then it ranks the importance of keywords by calculating keywords’ frequency of existence in order to find trendy keywords by using the Burst Detection Algorithm which is proposed in this paper based on the idea of Kleinberg. This method is used for detecting bursts. A burst is defined as a period of time when a keyword is continuously and unusually popular over the text stream and the identification of bursts is known as burst detection procedure. The results from user requests could be displayed visually. Furthermore, we create a method in order to find a trendy keyword set which is defined as a set of keywords that belong to the same burst. This work also describes the datasets used for our experiments, processing speed tests of our two proposed algorithms.
Similar content being viewed by others
References
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Discov. 7(4), 373–397 (2003)
Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on Very large data bases, pp. 181–192. ACM (2005)
Wang, M., Madhyastha, T., Chan, N.H., Papadimitriou, S., Faloutsos, C.: Data mining meets performance evaluation: fast algorithms for modeling bursty traffic. In: Proceedings 18th International Conference on Data Engineering, pp. 507–516. IEEE (2002). https://doi.org/10.1109/ICDE.2002.994770
Zhang, X.: Fast algorithms for burst detection. Ph.D. thesis, New York University, Graduate School of Arts and Science (2006)
Neill, D.B., Moore, A.W.: A fast multi-resolution method for detection of significant spatial disease clusters. In: Advances in Neural Information Processing Systems, pp. 651–658. MIT Press (2004)
Neill, D.B., Moore, A.W.: Anomalous spatial cluster detection. In: Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection (2005)
Neill, D.B., Moore, A.W., Pereira, F., Mitchell, T.M.: Detecting significant multidimensional spatial clusters. In: Advances in Neural Information Processing Systems, pp. 969–976 (2005)
Saul, L.K., Weiss, Y., Bottou, L.: Advances in Neural Information Processing Systems 17 (2005)
Thrun, S., Saul, L.K., Schölkopf, B.: Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, vol. 16. MIT Press (2004)
Bakkum, D.J., Radivojevic, M., Frey, U., Franke, F., Hierlemann, A., Takahashi, H.: Parameters for burst detection. Front. Comput. Neurosci. 7, 193 (2014)
Wagenaar, D., DeMarse, T.B., Potter, S.M.: Meabench: a toolset for multi-electrode data acquisition and on-line analysis. In: Conference Proceedings. 2nd International IEEE EMBS Conference on Neural Engineering, 2005, pp. 518–521. IEEE (2005)
Romsaiyud, W.: Detecting emergency events and geo-location awareness from twitter streams. In: The International Conference on E-Technologies and Business on the Web (EBW2013), pp. 22–27 (2013)
Weng, J., Lee, B.S.: Event detection in twitter. In: Fifth International AAAI Conference on Weblogs and Social Media, pp.17–21 (2011)
Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D.: Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 131–142 (2004). https://doi.org/10.1145/1007568.1007586
Zhang, Y., Hua, W., Yuan, S.: Mapping the scientific research on open data: a bibliometric review. Learned Publ. 31(2), 95–106 (2018)
Heydari, A., ali Tavakoli, M., Salim, N., Heydari, Z.: Detection of review spam: a survey. Expert Syst. Appl. 42(7), 3634–3642 (2015)
Yamamoto, S., Wakabayashi, K., Kando, N., Satoh, T.: Twitter user tagging method based on burst time series. Int. J. Web Inf. Syst. 12(3), 292–311 (2016)
Huyen, N.T.M., Roussanaly, A., Vinh, H.T., et al.: A hybrid approach to word segmentation of vietnamese texts. In: International conference on language and automata theory and applications, pp. 240–249. Springer, Berlin (2008)
Hong, T.V.T., Do, P.: Developing a graph-based system for storing, exploiting and visualizing text stream. In: Proceedings of the 2nd international conference on machine learning and soft computing, pp. 82–86 (2018). https://doi.org/10.1145/3184066.3184084
Krishnamoorthy, M., Suresh, S., Alagappan, S., et al.: Deep learning techniques and optimization strategies in big data analytics: automated transfer learning of convolutional neural networks using enas algorithm. In: Deep Learning Techniques and Optimization Strategies in Big Data Analytics, pp. 142–153. IGI Global (2020)
Vasant, P.: Intelligent Computing & Optimization, vol. 866. Springer, Berlin (2019)
Acknowledgements
We greatly appreciate the support of the ICO 2018. We would like to offer our special thanks to Lac Hong University, Thu Dau Mot University, and Vietnam National University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
The work of Tham Vo is supported by Lac Hong University, and funded by Thu Dau Mot University (No. DT.20-031). The work of Phuc Do is funded by Vietnam National University, Ho Chi Minh City (No. DS2020-26-01).
Rights and permissions
About this article
Cite this article
Vo, T., Do, P. TKES: A Novel System for Extracting Trendy Keywords from Online News Sites. J. Oper. Res. Soc. China 10, 801–816 (2022). https://doi.org/10.1007/s40305-020-00327-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40305-020-00327-4