Skip to main content
Log in

Extracting and evaluating topics by region

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Analyzing streaming data that contains regional information can derive the interest trends of a region and the differences from those of other regions. The results of analyzing regional differences can be used for making important decisions in areas such as regional marketing and national policy establishment. In this paper, we propose a method to extract topics that represent regional interests from news articles collected by region. The proposed method consists of a novel word-weighting step to extract regional keywords and a word-clustering step to extract regional topics based on the associations between the extracted keywords. The validity of the extracted regional topics is evaluated through a comparison with a ground-truth topic set. Since each topic is represented by a set of words, and a regional topic set is represented by a family of sets, we propose a new clustering validity index for families of sets for a given set of regions. Using the proposed clustering validity index, the optimal parameters for the collected data are presented through experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  2. Daum. Retrieved from http://media.daum.net/netizen/hotlivenation/

  3. Facebook. Retrieved from https://www.facebook.com/

  4. Flickr. Retrieved from http://www.flickr.com/

  5. Ghazifard AM, Shams M, Shamaee Z (2013) Topic word set-based text clustering. Proceedings Of e-Commerce in Developing Countries: With Focus on e-Security (ECDC), 2013 7th Intenational Conference on. IEEE, pp. 1–10

  6. Hofmann T (1999) Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50-57

  7. Ishikawa S, Arakawa Y, Tagashira S, Fukuda A (2012) Hot topic detection in local areas using twitter and Wikipedia. Proceedings of ARCS Workshops, pp. 165–174

  8. Lan M, Tan C, Jian S, Lu Y (2009) Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans Pattern Anal Mach Intell 31(4):721–735

    Article  Google Scholar 

  9. Largeron C, Moulin C, Géry M (2011) Entropy based feature selection for text categorization. Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 924–928

  10. Twitter. Retrieved from https://twitter.com/

  11. Wang D, Zhang H (2013) Inverse-category-frequency based supervised term weighting scheme for text categorization. J Inf Sci Eng 29:209–225

    Google Scholar 

  12. Wikipedia. Retrieved from http://en.wikipedia.org/wiki/Tf–idf

  13. Yang H, Chen S, Lyu M, King I (2011) Location-based topic evolution. Proceedings of the 1st international workshop on Mobile location-based service, pp. 89–98

  14. Yin Z, Cao L, Han J, Zhai C, Huang T (2011) Geographical topic discovery and comparison. Proceedings of the 20th international conference on World Wide Web, pp. 247–256

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning(No. 2013R1A2A2A04016948).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soowon Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Noh, J., Lee, S. Extracting and evaluating topics by region. Multimed Tools Appl 75, 12765–12777 (2016). https://doi.org/10.1007/s11042-016-3528-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3528-6

Keywords

Navigation