WikiTrends: Unstructured Wikipedia-Based Text Analytics Framework

  • Michel Naim GerguisEmail author
  • Cherif Salama
  • M. Watheq El-Kharashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10260)


WikiTrends is a new analytics framework for Wikipedia articles. It adds the temporal/spatial dimensions to Wikipedia to visualize the extracted information converting the big static encyclopedia to a vibrant one by enabling the generation of aggregated views in timelines or heat maps for any user-defined collection from unstructured text. Data mining techniques were applied to detect the location, start and end year of existence, gender, and entity class for 4.85 million pages. We evaluated our extractors over a small manually tagged random set of articles. Heat maps of notable football players’ counts over history or dominant occupations in some specific era are samples of WikiTrends maps while timelines can easily illustrate interesting fame battles over history between male and female actors, music genres, or even between American, Italian, and Indian films. Through information visualization and simple configurations, WikiTrends starts a new experience in answering questions through a figure.


Data mining Entity analytics Entity classification Fine-grained classification Text analytics Text classification Text understanding Wikipedia 


  1. 1.
    Acerbi, A., Lampos, V., Garnett, P., Bentley, R.A.: The expression of emotions in 20th century books. PloS One 8(3), e59030 (2013)CrossRefGoogle Scholar
  2. 2.
    Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: 22nd International Conference on World Wide Web, pp. 1037–1044. ACM (2013)Google Scholar
  3. 3.
    Althoff, T., Dong, X.L., Murphy, K., Alai, S., Dang, V., Zhang, W.: Timemachine: timeline generation for knowledge-base entities. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2015)Google Scholar
  4. 4.
    Au Yeung, C.M., Jatowt, A.: Studying how the past is remembered: towards computational history through large scale text mining. In: 20th ACM International Conference on Information and Knowledge Management, pp. 1231–1240 (2011)Google Scholar
  5. 5.
    Bautin, M., Ward, C.B., Patil, A., Skiena, S.S.: Access: news and blog analysis for the social sciences. In: 19th International Conference on World Wide Web, pp. 1229–1232. ACM (2010)Google Scholar
  6. 6.
    Chen, Y.F.R., Di Fabbrizio, G., Gibbon, D., Jora, S., Renger, B., Wei, B.: Geotracker: geospatial and temporal RSS navigation. In: 16th International Conference on World Wide Web, pp. 41–50. ACM (2007)Google Scholar
  7. 7.
    Dbpedia. Accessed 3 Apr 2017
  8. 8.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Hoffart, J., Milchevski, D., Weikum, G.: Aesthetics: analytics with strings, things, and cats. In: 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2018–2020 (2014)Google Scholar
  10. 10.
    Huet, T., Biega, J., Suchanek, F.M.: Mining history with Le Monde. In: 2013 Workshop on Automated Knowledge Base Construction, pp. 49–54. ACM (2013)Google Scholar
  11. 11.
    IMDB. Accessed 3 Apr 2017
  12. 12.
    Kazama, J.I., Torisawa, K.: Exploiting wikipedia as external knowledge for named entity recognition. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707 (2007)Google Scholar
  13. 13.
    Kestemont, M., Karsdorp, F., Dring, M.: Mining the twentieth centurys history from the time magazine corpus. In: European Chapter of the Association for Computational Linguistics, EACL, vol. 62 (2014)Google Scholar
  14. 14.
    Kumar, N., Sahu, M.: The evolution of marketing history: a peek through google ngram viewer. Asian J. Manag. Res. 1(2), 415–426 (2011)Google Scholar
  15. 15.
    Leetaru, K.: Culturomics 2.0: forecasting large-scale human behavior using global news media tone in time and space. First Monday 16(9) (2011)Google Scholar
  16. 16.
    Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR (2014)Google Scholar
  17. 17.
    Martins, B., Manguinhas, H., Borbinha, J.: Extracting and exploring the geo-temporal semantics of textual resources. In: 2008 IEEE International Conference on Semantic Computing, pp. 1–9 (2008)Google Scholar
  18. 18.
    Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)Google Scholar
  19. 19.
    Ng, R., Allore, H.G., Trentalange, M., Monin, J.K., Levy, B.R.: Increasing negativity of age stereotypes across 200 years: evidence from a database of 400 million words. PloS One 10(2), e0117086 (2015)CrossRefGoogle Scholar
  20. 20.
    Nguyen, D.P., Matsuo, Y., Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: National Conference on Artificial Intelligence, vol. 22, no. 2, p. 1414 (2007)Google Scholar
  21. 21.
    Phani, S., Lahiri, S., Biswas, A.: Culturomics on a Bengali newspaper corpus. In: 2012 International Conference on Asian Language Processing (IALP), pp. 237–240. IEEE (2012)Google Scholar
  22. 22.
    Sasahara, K., Hirata, Y., Toyoda, M., Kitsuregawa, M., Aihara, K.: Quantifying collective attention from tweet stream. PloS One 8(4), e61823 (2013)CrossRefGoogle Scholar
  23. 23.
    SeeAlso. Accessed 3 Apr 2017
  24. 24.
    Soper, D.S., Turel, O.: Who are we? Mining institutional identities using n-grams. In: 45th Hawaii International Conference on System Science (HICSS), pp. 1107–1116. IEEE (2012)Google Scholar
  25. 25.
    Sreenivasan, S.: Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. arxiv preprint arXiv:1304.0786 (2013)
  26. 26.
    Stergiou, K.I., Tsikliras, A.C.: Global university reputation and rankings: insights from culturomics. Ethics Sci. Environ. Politics 13(2), 193–202 (2013)CrossRefGoogle Scholar
  27. 27.
    Strtgen, J., Gertz, M.: TimeTrails: a system for exploring spatio-temporal information in documents. VLDB Endowment 3(1–2), 1569–1572 (2010)CrossRefGoogle Scholar
  28. 28.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)Google Scholar
  29. 29.
    Takahashi, Y., Ohshima, H., Yamamoto, M., Iwasaki, H., Oyama, S., Tanaka, K.: Evaluating significance of historical entities based on tempo-spatial impacts analysis using wikipedia link structure. In: 22nd ACM Conference on Hypertext and Hypermedia, pp. 83–92 (2011)Google Scholar
  30. 30.
    Tuan, T.A., Elbassuoni, S., Preda, N., Weikum, G.: Cate: context-aware timeline for entity illustration. In: 20th International Conference Companion on World Wide Web, pp. 269–272. ACM (2011)Google Scholar
  31. 31.
    Twenge, J.M., Campbell, W.K., Gentile, B.: Increases in individualistic words and phrases in american books, 1960–2008. PloS One 7(7), e40181 (2012)CrossRefGoogle Scholar
  32. 32.
    Uren, V., Dadzie, A.S.: Relative trends in scientific terms on twitter. In: Altmetrics Workshop Held in Conjunction with the ACM 3rd International Conference on Web Science (2011)Google Scholar
  33. 33.
    Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Michel Naim Gerguis
    • 1
    Email author
  • Cherif Salama
    • 1
  • M. Watheq El-Kharashi
    • 1
  1. 1.Computer and Systems Engineering DepartmentAin Shams UniversityCairoEgypt

Personalised recommendations