Abstract
This study proposes an improvement to the Insight Centre for Data Analytics algorithm, which identifies the most relevant topics in a corpus of tweets, and allows the construction of search rules for that topic or topics, in order to build a corpus of tweets for analysis. The improvement shows above 14% improvement in Purity and other metrics, and an execution time of 10% compared to Latent Dirichlet Allocation (LDA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Relatively homogeneous “natural” groupings in a statistical population.
- 2.
A collection of language pieces that are selected according to explicit linguistic criteria with the purpose of being used as a language sample.
- 3.
As the dimensions increase, data becomes sparse. Therefore, the amount of data that is needed must increase exponentially with it or there will not be enough points in the sample of more space for any type of analysis to perform.
- 4.
The ngram ‘ganas tomar Corona’, refer literally to ‘want drink Corona’, where Corona is a beer produced in Mexico.
- 5.
The ngram ‘gustado video youtube’, refer literally to ‘liked video youtube’.
- 6.
An ngram is considered important enough if it “passes” a function of our frequency parameters defined at the start. At the very least, the ngram must be repeated 5 times.
References
Maksood, F.Z., Achuthan, G.: Analysis of data mining techniques and its applications. Anal. 140(3), 0975–8887 (2016)
Holt, R.: Twitter in numbers (2013). http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html. Accessed 30 Jan 2017
Thomas, A., Sindhu, L.: A survey on content based semantic relations in tweets. Int. J. Comput. Appl. 132(11), 14–18 (2015)
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11, 1–2 (2010)
Adarsh, M., Ravikumar, P.: Survey: Twitter data analysis using opinion mining. Int. J. Comput. Appl. 128(5), 34–36 (2015)
Anjaria, M., Reddy Guddeti, RM.: Influence factor based opinion mining of twitter data using supervised learning. In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8. IEEE (2014)
Ifrim, G., Shi, B., Brigadir, I.: Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea, p. 8. ACM (2014)
Manning, C., Raghavan, P.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Javed, N.: Automating corpora generation with semantic cleaning and tagging of tweets for multi-dimensional social media analytics. Int. J. Comput. Appl. 127(12), 11–16 (2015)
Agarwal, V., Thakare, S., Jaiswal, A.: Survey on classification techniques for data mining. Int. J. Comput. Appl. 132(4), 13–16 (2015)
Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM (2011)
Walther, M., Kaisser, M.: Geo-spatial event detection in the twitter stream. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 356–367. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_30
Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_55
Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of tweet contents for enhanced event detection in twitter. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 20–24. IEEE Computer Society (2012)
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, pp. 80–88. ACM (2010)
Porter, M.F.: Snowball: A language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html Accessed 08 Feb 2017
Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)
Barber, J.: Latent dirichlet allocation (lda) with python (2001). https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html Accessed 21 Jun 2017]
García Cumbreras, M.Á., Martínez Cámara, Villena Román, J., García Morera, J.: Tass 2015-the evolution of the Spanish opinion mining systems (2016)
Villena Román, J., Martínez Cámara, E., García Morera, J., Jiménez Zafra, S.M.: Tass 2014-the challenge of aspect-based sentiment analysis. Procesamiento del Lenguaje Nat. 54, 61–68 (2015)
Villena Román, J., García Morera, J., Lana Serrano, S., González Cristóbal, J.C.: Tass 2013-a second step in reputation analysis in Spanish (2014)
Villena Román, J., Lana Serrano, S., Martínez Cámara, E., González Cristóbal, J.C.: Tass-workshop on sentiment analysis at sepln (2013)
TASS: Taller de análisis de sentimientos en la sepln (2017). http://www.sngularmeaning.team/TASS Accessed 19 April 2017
Acknowledgments
This work has been supported by Asociación Mexicana de Cultura A.C. and Consejo Nacional de Ciencia y Tecnología (CONACyT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Martínez, E.D., Fonseca, J.P., González, V.M., Garduño, G., Huipet, H.H. (2017). Detection of Topics and Construction of Search Rules on Twitter. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-66562-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66561-0
Online ISBN: 978-3-319-66562-7
eBook Packages: Computer ScienceComputer Science (R0)