Skip to main content

Detection of Topics and Construction of Search Rules on Twitter

  • Conference paper
  • First Online:
Book cover Advances in Computing (CCC 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 735))

Included in the following conference series:

  • 1688 Accesses

Abstract

This study proposes an improvement to the Insight Centre for Data Analytics algorithm, which identifies the most relevant topics in a corpus of tweets, and allows the construction of search rules for that topic or topics, in order to build a corpus of tweets for analysis. The improvement shows above 14% improvement in Purity and other metrics, and an execution time of 10% compared to Latent Dirichlet Allocation (LDA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Relatively homogeneous “natural” groupings in a statistical population.

  2. 2.

    A collection of language pieces that are selected according to explicit linguistic criteria with the purpose of being used as a language sample.

  3. 3.

    As the dimensions increase, data becomes sparse. Therefore, the amount of data that is needed must increase exponentially with it or there will not be enough points in the sample of more space for any type of analysis to perform.

  4. 4.

    The ngram ‘ganas tomar Corona’, refer literally to ‘want drink Corona’, where Corona is a beer produced in Mexico.

  5. 5.

    The ngram ‘gustado video youtube’, refer literally to ‘liked video youtube’.

  6. 6.

    An ngram is considered important enough if it “passes” a function of our frequency parameters defined at the start. At the very least, the ngram must be repeated 5 times.

References

  1. Maksood, F.Z., Achuthan, G.: Analysis of data mining techniques and its applications. Anal. 140(3), 0975–8887 (2016)

    Google Scholar 

  2. Holt, R.: Twitter in numbers (2013). http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html. Accessed 30 Jan 2017

  3. Thomas, A., Sindhu, L.: A survey on content based semantic relations in tweets. Int. J. Comput. Appl. 132(11), 14–18 (2015)

    Google Scholar 

  4. O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11, 1–2 (2010)

    Google Scholar 

  5. Adarsh, M., Ravikumar, P.: Survey: Twitter data analysis using opinion mining. Int. J. Comput. Appl. 128(5), 34–36 (2015)

    Google Scholar 

  6. Anjaria, M., Reddy Guddeti, RM.: Influence factor based opinion mining of twitter data using supervised learning. In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8. IEEE (2014)

    Google Scholar 

  7. Ifrim, G., Shi, B., Brigadir, I.: Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea, p. 8. ACM (2014)

    Google Scholar 

  8. Manning, C., Raghavan, P.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  9. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  10. Javed, N.: Automating corpora generation with semantic cleaning and tagging of tweets for multi-dimensional social media analytics. Int. J. Comput. Appl. 127(12), 11–16 (2015)

    Google Scholar 

  11. Agarwal, V., Thakare, S., Jaiswal, A.: Survey on classification techniques for data mining. Int. J. Comput. Appl. 132(4), 13–16 (2015)

    Google Scholar 

  12. Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM (2011)

    Google Scholar 

  13. Walther, M., Kaisser, M.: Geo-spatial event detection in the twitter stream. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 356–367. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_30

    Chapter  Google Scholar 

  14. Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23535-1_55

    Chapter  Google Scholar 

  15. Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of tweet contents for enhanced event detection in twitter. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 20–24. IEEE Computer Society (2012)

    Google Scholar 

  16. Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, pp. 80–88. ACM (2010)

    Google Scholar 

  17. Porter, M.F.: Snowball: A language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html Accessed 08 Feb 2017

  18. Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011)

  19. Barber, J.: Latent dirichlet allocation (lda) with python (2001). https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html Accessed 21 Jun 2017]

  20. García Cumbreras, M.Á., Martínez Cámara, Villena Román, J., García Morera, J.: Tass 2015-the evolution of the Spanish opinion mining systems (2016)

    Google Scholar 

  21. Villena Román, J., Martínez Cámara, E., García Morera, J., Jiménez Zafra, S.M.: Tass 2014-the challenge of aspect-based sentiment analysis. Procesamiento del Lenguaje Nat. 54, 61–68 (2015)

    Google Scholar 

  22. Villena Román, J., García Morera, J., Lana Serrano, S., González Cristóbal, J.C.: Tass 2013-a second step in reputation analysis in Spanish (2014)

    Google Scholar 

  23. Villena Román, J., Lana Serrano, S., Martínez Cámara, E., González Cristóbal, J.C.: Tass-workshop on sentiment analysis at sepln (2013)

    Google Scholar 

  24. TASS: Taller de análisis de sentimientos en la sepln (2017). http://www.sngularmeaning.team/TASS Accessed 19 April 2017

Download references

Acknowledgments

This work has been supported by Asociación Mexicana de Cultura A.C. and Consejo Nacional de Ciencia y Tecnología (CONACyT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor M. González .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Martínez, E.D., Fonseca, J.P., González, V.M., Garduño, G., Huipet, H.H. (2017). Detection of Topics and Construction of Search Rules on Twitter. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66562-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66561-0

  • Online ISBN: 978-3-319-66562-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics