In text mining: detection of topic and sub-topic using multiple spider hunting model

  • E. ElakiyaEmail author
  • N. Rajkumar
Original Research


In this electronic era, everyone is in fast communication and sharing of data through social Medias. Within a fraction of second we received millions of text in whatsapp, facebook, twitter, mails and etc. It is really hard to categorize relevant data and information from massive volume of text documents. Instead of reading all documents fully, there is a need to determine Topic and subtopic of a corpus. Existing technique takes more time to detect topic and subtopic of a corpus, so we proposed dynamic multiple spider hunting algorithm. Due to the usage of multiple spiders, this technique could effectively recognize the desired artifacts with minimum amount of time and have superior performance compared to other techniques.


Topic detection Sub-topic detection Multiple spider hunting algorithms 



  1. Aggarwal CC, Zhai C (2012a) Mining text data. Springer, Tokyo, pp 140–521CrossRefGoogle Scholar
  2. Aggarwal CC, Zhai C (2012b) A survey of text clustering algorithms. In: Mining text data, pp 77–128CrossRefGoogle Scholar
  3. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. LREC 10:2200–2204Google Scholar
  4. Banea C, Mihalcea R, Wiebe J (2008) A bootstrapping method for building subjectivity lexicons for languages with scarce resources. LREC 8:2764–2767Google Scholar
  5. Bhanuse SS, Kamble SD, Kakde SM (2016) Text mining using metadata for generation of side information. Proc Comput Sci 78:807–814CrossRefGoogle Scholar
  6. Blei DM, Lafferty JD (2006) Dynamic topic models. In: ACM proceedings of the international conference on machine learning, pp 113–120Google Scholar
  7. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  8. Chen T, Honda K (2018) Solving data preprocessing problems in existing location-aware systems. J Ambient Intell Human Comput 9(2):253–259CrossRefGoogle Scholar
  9. Chen KY, Luesukprasert L, Seng-cho TC (2007) Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans Knowl Data Eng 19(8):1016–1025CrossRefGoogle Scholar
  10. Chen J, Cypher A, Drews C, Nichols J (2013) CrowdE: filtering tweets for direct customer engagements. In: Proceedings of the seventh international AAAI conference on weblogs and social media, pp 51–60Google Scholar
  11. Chien JT, Chueh CH (2011) Topic-based hierarchical segmentation. IEEE Trans Audio Speech Lang Process 20(1):55–66CrossRefGoogle Scholar
  12. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537zbMATHGoogle Scholar
  13. Dan O, Feng J, Davison BD (2011) A boot strapping approach to identifying relevant tweets for social TV. In: Proceedings of the fifth international AAAI conference on weblogs and social media, pp 462–465Google Scholar
  14. Dhillon IS, Fan J, Guan Y (2001) Efficient clustering of very large document collections. In: Data mining for scientific and engineering applications, Springer, pp 357–381Google Scholar
  15. Dong G, Yang W, Zhu F, Wang W (2017) Discovering burst patterns of burst topic in twitter. Comput Electr Eng 58:551–559CrossRefGoogle Scholar
  16. Duque S, Bin Omar MN (2015) Using data mining algorithms for developing a model for intrusion detection system (IDS). Proc Comput Sci 61:46–51CrossRefGoogle Scholar
  17. Elakiya E, Rajkumar N (2018) Topic detection using spider hunting algorithm. J Comput Theor Nano Sci 15(4):1402–1408CrossRefGoogle Scholar
  18. Harabagiu S, Lacatusu F (2005). Topic themes for multi-document summarization. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, pp 202–209Google Scholar
  19. Hashimoto K, Kontonatsios G, Miwa M, Ananiadou S (2016) Topic detection using paragraph vectors to support active learning in systematic reviews. J Biomed Inform 62:59–65CrossRefGoogle Scholar
  20. He Q, Chang K, Lim EP, Banerjee A (2010) Keep it simple with time: a reexamination of probabilistic topic detection models. IEEE Trans Pattern Anal Mach Intell 32(10):1795–1808CrossRefGoogle Scholar
  21. Hepp M (2006) Semantic web and semantic Web services: father and son or indivisible twins? IEEE Internet Comput 10(2):85–88CrossRefGoogle Scholar
  22. Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: ACM proceedings of the international conference on world wide web, pp 607–618Google Scholar
  23. Karampatsis RM, Pavlopoulos J, Malakasiotis P (2014) AUEB: two stage sentiment analysis of social network messages. In: Proceedings of the international workshop on semantic evaluation, pp 114–118Google Scholar
  24. Karidi DP, Stavrakas Y, Vassiliou Y (2018) Tweet and followee personalized recommendations based on knowledge graphs. J Ambient Intell Human Comput 9(6):2035–2049CrossRefGoogle Scholar
  25. Langlet C, Clavel C (2016) Grounding the detection of the user’s likes and dislikes on the topic structure of human–agent interactions. Knowl Based Syst 106:116–124CrossRefGoogle Scholar
  26. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196Google Scholar
  27. Lee CH, Fu YH (2008) Web usage mining based on clustering of browsing features. IEEE Eighth Int Conf Intell Syst Design Appl 1:281–286Google Scholar
  28. Liu X, Croft BW (2004) Cluster-based retrieval using language models. In: ACM proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, pp 186–193Google Scholar
  29. Liu X, Tao D, Song M, Zhang L, Bu J, Chen C (2014) Learning to track multiple targets. IEEE Trans Neural Netw Learn Syst 26(5):1060–1073MathSciNetGoogle Scholar
  30. Magdy W, Elsayed T (2016) Unsupervised adaptive microblog filtering for broad dynamic topics. Inf Process Manage 52(4):513–528CrossRefGoogle Scholar
  31. Mishne G (2005) Experiments with mood classification in blog posts. Proc ACM SIGIR Workshop Stylist Anal Text Inf Access 19:321–327Google Scholar
  32. Mörchen F, Dejori M, Fradkin D, Etienne J, Wachmann B, Bundschus M (2008) Anticipating annotations and emerging trends in biomedical literature. IN: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 954–962Google Scholar
  33. Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986CrossRefGoogle Scholar
  34. Seo YW (2004) Text clustering for topic detection. Carnegie Mellon University, Pittsburgh, pp 1–12CrossRefGoogle Scholar
  35. Shivaprasad G, Reddy NS, Acharya UD, Aithal PK (2015) Neuro-fuzzy based hybrid model for web usage mining. Proc Comput Sci 54:327–334CrossRefGoogle Scholar
  36. Strapparava C, Valitutti A (2004) Word net affect: an affective extension of wordnet. LREC 4:1083–1086Google Scholar
  37. Turan M, Kececi O, Kesim AE (2012) Article (document) topic and subtopic detection. (Undergraduate thesis). İstanbul KültürUniversity, İstanbulGoogle Scholar
  38. Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Syst Appl 60:27–38CrossRefGoogle Scholar
  39. Yoon HG, Kim H, Kim CO, Song M (2016) Opinion polarity detection in twitter data combining shrinkage regression and topic modeling. J Inf 10(2):634–644Google Scholar
  40. Zhang C, Wang H, Cao L, Wang W, Xu F (2016) A hybrid term–term relations analysis approach for topic detection. Knowl Based Syst 93:109–120CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.E.G.S. Pillay Engineering CollegeNagapattinamIndia
  2. 2.Hindusthan College of Engineering and TechnologyCoimbatoreIndia

Personalised recommendations