Abstract
WikiTrends is a new analytics framework for Wikipedia articles. It adds the temporal/spatial dimensions to Wikipedia to visualize the extracted information converting the big static encyclopedia to a vibrant one by enabling the generation of aggregated views in timelines or heat maps for any user-defined collection from unstructured text. Data mining techniques were applied to detect the location, start and end year of existence, gender, and entity class for 4.85 million pages. We evaluated our extractors over a small manually tagged random set of articles. Heat maps of notable football players’ counts over history or dominant occupations in some specific era are samples of WikiTrends maps while timelines can easily illustrate interesting fame battles over history between male and female actors, music genres, or even between American, Italian, and Indian films. Through information visualization and simple configurations, WikiTrends starts a new experience in answering questions through a figure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acerbi, A., Lampos, V., Garnett, P., Bentley, R.A.: The expression of emotions in 20th century books. PloS One 8(3), e59030 (2013)
Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: 22nd International Conference on World Wide Web, pp. 1037–1044. ACM (2013)
Althoff, T., Dong, X.L., Murphy, K., Alai, S., Dang, V., Zhang, W.: Timemachine: timeline generation for knowledge-base entities. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2015)
Au Yeung, C.M., Jatowt, A.: Studying how the past is remembered: towards computational history through large scale text mining. In: 20th ACM International Conference on Information and Knowledge Management, pp. 1231–1240 (2011)
Bautin, M., Ward, C.B., Patil, A., Skiena, S.S.: Access: news and blog analysis for the social sciences. In: 19th International Conference on World Wide Web, pp. 1229–1232. ACM (2010)
Chen, Y.F.R., Di Fabbrizio, G., Gibbon, D., Jora, S., Renger, B., Wei, B.: Geotracker: geospatial and temporal RSS navigation. In: 16th International Conference on World Wide Web, pp. 41–50. ACM (2007)
Dbpedia. http://wiki.dbpedia.org/. Accessed 3 Apr 2017
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Hoffart, J., Milchevski, D., Weikum, G.: Aesthetics: analytics with strings, things, and cats. In: 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2018–2020 (2014)
Huet, T., Biega, J., Suchanek, F.M.: Mining history with Le Monde. In: 2013 Workshop on Automated Knowledge Base Construction, pp. 49–54. ACM (2013)
IMDB. http://www.imdb.com/. Accessed 3 Apr 2017
Kazama, J.I., Torisawa, K.: Exploiting wikipedia as external knowledge for named entity recognition. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707 (2007)
Kestemont, M., Karsdorp, F., Dring, M.: Mining the twentieth centurys history from the time magazine corpus. In: European Chapter of the Association for Computational Linguistics, EACL, vol. 62 (2014)
Kumar, N., Sahu, M.: The evolution of marketing history: a peek through google ngram viewer. Asian J. Manag. Res. 1(2), 415–426 (2011)
Leetaru, K.: Culturomics 2.0: forecasting large-scale human behavior using global news media tone in time and space. First Monday 16(9) (2011)
Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR (2014)
Martins, B., Manguinhas, H., Borbinha, J.: Extracting and exploring the geo-temporal semantics of textual resources. In: 2008 IEEE International Conference on Semantic Computing, pp. 1–9 (2008)
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Ng, R., Allore, H.G., Trentalange, M., Monin, J.K., Levy, B.R.: Increasing negativity of age stereotypes across 200 years: evidence from a database of 400 million words. PloS One 10(2), e0117086 (2015)
Nguyen, D.P., Matsuo, Y., Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: National Conference on Artificial Intelligence, vol. 22, no. 2, p. 1414 (2007)
Phani, S., Lahiri, S., Biswas, A.: Culturomics on a Bengali newspaper corpus. In: 2012 International Conference on Asian Language Processing (IALP), pp. 237–240. IEEE (2012)
Sasahara, K., Hirata, Y., Toyoda, M., Kitsuregawa, M., Aihara, K.: Quantifying collective attention from tweet stream. PloS One 8(4), e61823 (2013)
SeeAlso. http://seealso.org/. Accessed 3 Apr 2017
Soper, D.S., Turel, O.: Who are we? Mining institutional identities using n-grams. In: 45th Hawaii International Conference on System Science (HICSS), pp. 1107–1116. IEEE (2012)
Sreenivasan, S.: Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. arxiv preprint arXiv:1304.0786 (2013)
Stergiou, K.I., Tsikliras, A.C.: Global university reputation and rankings: insights from culturomics. Ethics Sci. Environ. Politics 13(2), 193–202 (2013)
Strtgen, J., Gertz, M.: TimeTrails: a system for exploring spatio-temporal information in documents. VLDB Endowment 3(1–2), 1569–1572 (2010)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)
Takahashi, Y., Ohshima, H., Yamamoto, M., Iwasaki, H., Oyama, S., Tanaka, K.: Evaluating significance of historical entities based on tempo-spatial impacts analysis using wikipedia link structure. In: 22nd ACM Conference on Hypertext and Hypermedia, pp. 83–92 (2011)
Tuan, T.A., Elbassuoni, S., Preda, N., Weikum, G.: Cate: context-aware timeline for entity illustration. In: 20th International Conference Companion on World Wide Web, pp. 269–272. ACM (2011)
Twenge, J.M., Campbell, W.K., Gentile, B.: Increases in individualistic words and phrases in american books, 1960–2008. PloS One 7(7), e40181 (2012)
Uren, V., Dadzie, A.S.: Relative trends in scientific terms on twitter. In: Altmetrics Workshop Held in Conjunction with the ACM 3rd International Conference on Web Science (2011)
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gerguis, M.N., Salama, C., El-Kharashi, M.W. (2017). WikiTrends: Unstructured Wikipedia-Based Text Analytics Framework. In: Frasincar, F., Ittoo, A., Nguyen, L., MĂ©tais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)