WikiTrends: Unstructured Wikipedia-Based Text Analytics Framework

Gerguis, Michel Naim; Salama, Cherif; El-Kharashi, M. Watheq

doi:10.1007/978-3-319-59569-6_6

Michel Naim Gerguis¹⁷,
Cherif Salama¹⁷ &
M. Watheq El-Kharashi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1955 Accesses
1 Citations
3 Altmetric

Abstract

WikiTrends is a new analytics framework for Wikipedia articles. It adds the temporal/spatial dimensions to Wikipedia to visualize the extracted information converting the big static encyclopedia to a vibrant one by enabling the generation of aggregated views in timelines or heat maps for any user-defined collection from unstructured text. Data mining techniques were applied to detect the location, start and end year of existence, gender, and entity class for 4.85 million pages. We evaluated our extractors over a small manually tagged random set of articles. Heat maps of notable football players’ counts over history or dominant occupations in some specific era are samples of WikiTrends maps while timelines can easily illustrate interesting fame battles over history between male and female actors, music genres, or even between American, Italian, and Indian films. Through information visualization and simple configurations, WikiTrends starts a new experience in answering questions through a figure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acerbi, A., Lampos, V., Garnett, P., Bentley, R.A.: The expression of emotions in 20th century books. PloS One 8(3), e59030 (2013)
Article Google Scholar
Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: 22nd International Conference on World Wide Web, pp. 1037–1044. ACM (2013)
Google Scholar
Althoff, T., Dong, X.L., Murphy, K., Alai, S., Dang, V., Zhang, W.: Timemachine: timeline generation for knowledge-base entities. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2015)
Google Scholar
Au Yeung, C.M., Jatowt, A.: Studying how the past is remembered: towards computational history through large scale text mining. In: 20th ACM International Conference on Information and Knowledge Management, pp. 1231–1240 (2011)
Google Scholar
Bautin, M., Ward, C.B., Patil, A., Skiena, S.S.: Access: news and blog analysis for the social sciences. In: 19th International Conference on World Wide Web, pp. 1229–1232. ACM (2010)
Google Scholar
Chen, Y.F.R., Di Fabbrizio, G., Gibbon, D., Jora, S., Renger, B., Wei, B.: Geotracker: geospatial and temporal RSS navigation. In: 16th International Conference on World Wide Web, pp. 41–50. ACM (2007)
Google Scholar
Dbpedia. http://wiki.dbpedia.org/. Accessed 3 Apr 2017
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet MATH Google Scholar
Hoffart, J., Milchevski, D., Weikum, G.: Aesthetics: analytics with strings, things, and cats. In: 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2018–2020 (2014)
Google Scholar
Huet, T., Biega, J., Suchanek, F.M.: Mining history with Le Monde. In: 2013 Workshop on Automated Knowledge Base Construction, pp. 49–54. ACM (2013)
Google Scholar
IMDB. http://www.imdb.com/. Accessed 3 Apr 2017
Kazama, J.I., Torisawa, K.: Exploiting wikipedia as external knowledge for named entity recognition. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707 (2007)
Google Scholar
Kestemont, M., Karsdorp, F., Dring, M.: Mining the twentieth centurys history from the time magazine corpus. In: European Chapter of the Association for Computational Linguistics, EACL, vol. 62 (2014)
Google Scholar
Kumar, N., Sahu, M.: The evolution of marketing history: a peek through google ngram viewer. Asian J. Manag. Res. 1(2), 415–426 (2011)
Google Scholar
Leetaru, K.: Culturomics 2.0: forecasting large-scale human behavior using global news media tone in time and space. First Monday 16(9) (2011)
Google Scholar
Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR (2014)
Google Scholar
Martins, B., Manguinhas, H., Borbinha, J.: Extracting and exploring the geo-temporal semantics of textual resources. In: 2008 IEEE International Conference on Semantic Computing, pp. 1–9 (2008)
Google Scholar
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Google Scholar
Ng, R., Allore, H.G., Trentalange, M., Monin, J.K., Levy, B.R.: Increasing negativity of age stereotypes across 200 years: evidence from a database of 400 million words. PloS One 10(2), e0117086 (2015)
Article Google Scholar
Nguyen, D.P., Matsuo, Y., Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: National Conference on Artificial Intelligence, vol. 22, no. 2, p. 1414 (2007)
Google Scholar
Phani, S., Lahiri, S., Biswas, A.: Culturomics on a Bengali newspaper corpus. In: 2012 International Conference on Asian Language Processing (IALP), pp. 237–240. IEEE (2012)
Google Scholar
Sasahara, K., Hirata, Y., Toyoda, M., Kitsuregawa, M., Aihara, K.: Quantifying collective attention from tweet stream. PloS One 8(4), e61823 (2013)
Article Google Scholar
SeeAlso. http://seealso.org/. Accessed 3 Apr 2017
Soper, D.S., Turel, O.: Who are we? Mining institutional identities using n-grams. In: 45th Hawaii International Conference on System Science (HICSS), pp. 1107–1116. IEEE (2012)
Google Scholar
Sreenivasan, S.: Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. arxiv preprint arXiv:1304.0786 (2013)
Stergiou, K.I., Tsikliras, A.C.: Global university reputation and rankings: insights from culturomics. Ethics Sci. Environ. Politics 13(2), 193–202 (2013)
Article Google Scholar
Strtgen, J., Gertz, M.: TimeTrails: a system for exploring spatio-temporal information in documents. VLDB Endowment 3(1–2), 1569–1572 (2010)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)
Google Scholar
Takahashi, Y., Ohshima, H., Yamamoto, M., Iwasaki, H., Oyama, S., Tanaka, K.: Evaluating significance of historical entities based on tempo-spatial impacts analysis using wikipedia link structure. In: 22nd ACM Conference on Hypertext and Hypermedia, pp. 83–92 (2011)
Google Scholar
Tuan, T.A., Elbassuoni, S., Preda, N., Weikum, G.: Cate: context-aware timeline for entity illustration. In: 20th International Conference Companion on World Wide Web, pp. 269–272. ACM (2011)
Google Scholar
Twenge, J.M., Campbell, W.K., Gentile, B.: Increases in individualistic words and phrases in american books, 1960–2008. PloS One 7(7), e40181 (2012)
Article Google Scholar
Uren, V., Dadzie, A.S.: Relative trends in scientific terms on twitter. In: Altmetrics Workshop Held in Conjunction with the ACM 3rd International Conference on Web Science (2011)
Google Scholar
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Systems Engineering Department, Ain Shams University, Cairo, 11517, Egypt
Michel Naim Gerguis, Cherif Salama & M. Watheq El-Kharashi

Authors

Michel Naim Gerguis
View author publications
You can also search for this author in PubMed Google Scholar
Cherif Salama
View author publications
You can also search for this author in PubMed Google Scholar
M. Watheq El-Kharashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Naim Gerguis .

Editor information

Editors and Affiliations

Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
University of Liège , Liège, Belgium
Ashwin Ittoo
Japan Advanced Institute of Science and Technology, Nomi, Japan
Le Minh Nguyen
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gerguis, M.N., Salama, C., El-Kharashi, M.W. (2017). WikiTrends: Unstructured Wikipedia-Based Text Analytics Framework. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-59569-6_6
Published: 02 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics