Detection of Topics and Construction of Search Rules on Twitter

Martínez, Eduardo D.; Fonseca, Juan P.; González, Víctor M.; Garduño, Guillermo; Huipet, Héctor H.

doi:10.1007/978-3-319-66562-7_13

Eduardo D. Martínez¹¹,
Juan P. Fonseca¹¹,
Víctor M. González¹¹,
Guillermo Garduño¹² &
…
Héctor H. Huipet¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 735))

Included in the following conference series:

Colombian Conference on Computing

1688 Accesses

Abstract

This study proposes an improvement to the Insight Centre for Data Analytics algorithm, which identifies the most relevant topics in a corpus of tweets, and allows the construction of search rules for that topic or topics, in order to build a corpus of tweets for analysis. The improvement shows above 14% improvement in Purity and other metrics, and an execution time of 10% compared to Latent Dirichlet Allocation (LDA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Relatively homogeneous “natural” groupings in a statistical population.
2.
A collection of language pieces that are selected according to explicit linguistic criteria with the purpose of being used as a language sample.
3.
As the dimensions increase, data becomes sparse. Therefore, the amount of data that is needed must increase exponentially with it or there will not be enough points in the sample of more space for any type of analysis to perform.
4.
The ngram ‘ganas tomar Corona’, refer literally to ‘want drink Corona’, where Corona is a beer produced in Mexico.
5.
The ngram ‘gustado video youtube’, refer literally to ‘liked video youtube’.
6.
An ngram is considered important enough if it “passes” a function of our frequency parameters defined at the start. At the very least, the ngram must be repeated 5 times.

References

Maksood, F.Z., Achuthan, G.: Analysis of data mining techniques and its applications. Anal. 140(3), 0975–8887 (2016)
Google Scholar
Holt, R.: Twitter in numbers (2013). http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html. Accessed 30 Jan 2017
Thomas, A., Sindhu, L.: A survey on content based semantic relations in tweets. Int. J. Comput. Appl. 132(11), 14–18 (2015)
Google Scholar
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11, 1–2 (2010)
Google Scholar
Adarsh, M., Ravikumar, P.: Survey: Twitter data analysis using opinion mining. Int. J. Comput. Appl. 128(5), 34–36 (2015)
Google Scholar
Anjaria, M., Reddy Guddeti, RM.: Influence factor based opinion mining of twitter data using supervised learning. In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8. IEEE (2014)
Google Scholar
Ifrim, G., Shi, B., Brigadir, I.: Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea, p. 8. ACM (2014)
Google Scholar
Manning, C., Raghavan, P.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Javed, N.: Automating corpora generation with semantic cleaning and tagging of tweets for multi-dimensional social media analytics. Int. J. Comput. Appl. 127(12), 11–16 (2015)
Google Scholar
Agarwal, V., Thakare, S., Jaiswal, A.: Survey on classification techniques for data mining. Int. J. Comput. Appl. 132(4), 13–16 (2015)
Google Scholar
Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM (2011)
Google Scholar
Walther, M., Kaisser, M.: Geo-spatial event detection in the twitter stream. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 356–367. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_30
Chapter Google Scholar
Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_55
Chapter Google Scholar
Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of tweet contents for enhanced event detection in twitter. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 20–24. IEEE Computer Society (2012)
Google Scholar
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, pp. 80–88. ACM (2010)
Google Scholar
Porter, M.F.: Snowball: A language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html Accessed 08 Feb 2017
Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)
Barber, J.: Latent dirichlet allocation (lda) with python (2001). https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html Accessed 21 Jun 2017]
García Cumbreras, M.Á., Martínez Cámara, Villena Román, J., García Morera, J.: Tass 2015-the evolution of the Spanish opinion mining systems (2016)
Google Scholar
Villena Román, J., Martínez Cámara, E., García Morera, J., Jiménez Zafra, S.M.: Tass 2014-the challenge of aspect-based sentiment analysis. Procesamiento del Lenguaje Nat. 54, 61–68 (2015)
Google Scholar
Villena Román, J., García Morera, J., Lana Serrano, S., González Cristóbal, J.C.: Tass 2013-a second step in reputation analysis in Spanish (2014)
Google Scholar
Villena Román, J., Lana Serrano, S., Martínez Cámara, E., González Cristóbal, J.C.: Tass-workshop on sentiment analysis at sepln (2013)
Google Scholar
TASS: Taller de análisis de sentimientos en la sepln (2017). http://www.sngularmeaning.team/TASS Accessed 19 April 2017

Download references

Acknowledgments

This work has been supported by Asociación Mexicana de Cultura A.C. and Consejo Nacional de Ciencia y Tecnología (CONACyT).

Author information

Authors and Affiliations

ITAM, 01080, Mexico City, Mexico
Eduardo D. Martínez, Juan P. Fonseca, Víctor M. González & Héctor H. Huipet
Sinnia, 11560, Mexico City, Mexico
Guillermo Garduño

Authors

Eduardo D. Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Juan P. Fonseca
View author publications
You can also search for this author in PubMed Google Scholar
Víctor M. González
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Garduño
View author publications
You can also search for this author in PubMed Google Scholar
Héctor H. Huipet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Víctor M. González .

Editor information

Editors and Affiliations

Universidad Autónoma de Occidente, Cali, Colombia
Andrés Solano
Universidad de San Buenaventura, Cali, Colombia
Hugo Ordoñez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, E.D., Fonseca, J.P., González, V.M., Garduño, G., Huipet, H.H. (2017). Detection of Topics and Construction of Search Rules on Twitter. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-66562-7_13
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66561-0
Online ISBN: 978-3-319-66562-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics