Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management


Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts manually mine the social media repositories (in particular Twitter) for monitoring the reputation of a particular entity. Recently, the online reputation management evaluation campaign known as RepLab at CLEF has turned attention to devising computational methods for facilitating reputation management experts. A quite significant research challenge related to the above issue is to classify the reputation dimension of tweets with respect to entity names. More specifically, finding various aspects of a brand’s reputation is an important task which can help companies in monitoring areas of their strengths and weaknesses in an effective manner. To address this issue in this paper we use dominant Wikipedia categories related to a reputation dimension; the dominant Wikipedia categories are then utilised within a semantic relatedness scoring framework to generate “associativities” with respect to the various reputation dimensions, and another version of “associativity” normalized by the “content entropy” of Wikipedia categories. The Wikipedia categories obtained through our applied methods are finally used in a random forest classifier for the task of reputation dimensions classification. The experimental evaluations show a significant improvement over the baseline accuracy.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    In the context of reputation management, an entity may refer to a celebrity, company, organization or brand.

  2. 2.

    Note that these are the standard dimensions provided by the Reputation Institute.

  3. 3.

    Available in 270+ languages.

  4. 4.


  5. 5.


  6. 6.

    Note that we have essentially utilised the dumps made available by DBPedia. However, despite the fact that DBPedia contains a notable work of semantic annotations, we are not using this additional information.

  7. 7.

    musician1 and musician2 are two different musicians such as Madonna and Lady Gaga.

  8. 8.

    Microsoft is a company whereas Windows10 is a product of Microsoft.

  9. 9.

    It is this pre-defined entity corresponding to which reputation dimensions classification for the tweet has to be performed.

  10. 10.

    An infobox is a fixed-format table designed to be added to the top right-hand corner of Wikipedia articles to consistently present a summary of some unifying aspect pertaining to the articles.

  11. 11.

    It is important to note that a category representative of the entity is selected at this phase.

  12. 12.

    An example category taxonomy for Apple Inc. can be seen on left side of Fig. 2.

  13. 13.

    E.g., Wikipedia article “Steve Jobs” of “Apple Inc.” is mentioned inside a category “1955 births” which is not present either in parent nor in sub-categories of entity’s Wikipedia article.

  14. 14.

    Normalizing a subtle relationship may result into mathematical zero due to small fraction and storing a low fraction with high precision is not an efficient choice.

  15. 15.

    This could be a paragraph, sentence or tweet.

  16. 16.

    Number of words in a phrase.

  17. 17.

    Empirically this aggregation performs reasonably well during the evaluations as shown in the later chapters.

  18. 18.

    Recall from Sect. 4.1.1 that the final step in extraction of candidate phrases corresponds to matching with Wikipedia article titles.

  19. 19.

    From within training data.

  20. 20.

    From the set WikiCategories that represents all Wikipedia categories within a given reputation dimension.

  21. 21.

    Note that RD represents the set of all seven reputation dimensions.

  22. 22.


  23. 23.

    http://bit.ly/1eMADG9, we aim to release the API as an open source Wikipedia tool to facilitate other researchers.


  1. Amigó E, De Albornoz JC, Chugur I, Corujo A, Gonzalo J, Martín T, Meij E, De Rijke M, Spina D (2013) Overview of RepLab 2013: evaluating online reputation monitoring systems. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 333–352

  2. Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 307–322

  3. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia—a crystallization point for the web of data. Web Semant Sci Serv Agents World Wide Web 7(3):154–165

  4. Clauson KA, Polen HH, Boulos MNK, Dzenowagis JH (2008) Scope, completeness, and accuracy of drug information in wikipedia. Ann Pharmacother 42(12):1814–1821

  5. De Maio C, Fenza G, Gallo M, Loia V, Senatore S (2014) Formal and relational concept analysis for fuzzy-based automatic semantic annotation. Appl Intell 40(1):154–177

  6. De Maio C, Fenza G, Loia V, Parente M (2016) Time aware knowledge extraction for microblog summarization on twitter. Inf Fusion 28:60–74

  7. Dellarocas C, Awad NF, Zhang XM (2003) Exploring the value of online reviews to organizations: implications for revenue forecasting and planning. Manag Sci 30:1407–1424

  8. Fombrun C, Shanley M (1990) What’s in a name? reputation building and corporate strategy. Acad Manag J 33(2):233–258

  9. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611

  10. Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34(2):443

  11. Giles J (2005) Internet encyclopaedias go head to head. Nature 438(7070):900–901

  12. Glance N, Hurst M, Nigam K, Siegler M, Stockton R, Tomokiyo T (2005) Deriving marketing intelligence from online discussion. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05, pp 419–428

  13. Handschuh S, Staab S, Ciravegna F (2002) S-cream—semi-automatic creation of metadata. In: International conference on knowledge engineering and knowledge management, Springer, pp 358–372

  14. Hassan S, Mihalcea R (2011) Semantic relatedness using salient semantic analysis. In: AAAI, pp 884–889

  15. Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web, ACM, pp 517–526

  16. Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 389–396

  17. Hutton JG, Goodman MB, Alexander JB, Genest CM (2001) Reputation management: the new face of corporate public relations? Public Relat Rev 27(3):247–261

  18. Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188

  19. Kiryakov A, Popov B, Terziev I, Manov D, Ognyanoff D (2004) Semantic annotation, indexing, and retrieval. Web Semant Sci Serv Agents World Wide Web 2(1):49–79

  20. Laclavik M, Šeleng M, Ciglan M, Hluchỳ L (2012) Ontea: platform for pattern based automated semantic annotation. Comput Inform 28(4):555–579

  21. Leal JP, Rodrigues V, Queirós R (2012) Computing semantic relatedness using dbpedia. In: Symposium on languages, applications and technologies, 1st, Schloss Dagstuhl, pp 133–147

  22. McDonald G, Deveaud R, McCreadie R, Macdonald C, Ounis I (2015) Tweet enrichment for effective dimensions classification in online reputation management. In: Ninth international AAAI conference on web and social media, Oxford

  23. Miao Y, Li C (2010) Enhancing query-oriented summarization based on sentence wikification. In: Workshop of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, Oxford, p 32

  24. Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, pp 233–242

  25. Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239

  26. Milne DN (2010) Applying wikipedia to interactive information retrieval. PhD thesis, University of Waikato

  27. Passant A (2010) Measuring semantic distance on linking data and using it for resources recommendations. In: AAAI spring symposium: linked data meets artificial intelligence, vol 77, p 123

  28. Qureshi MA (2015) Utilising wikipedia for text mining applications. PhD thesis, NUI, Galway, Ireland

  29. Rosenzweig R (2006) Can history be open source? Wikipedia and the future of the past. J Am Hist 93(1):117–146

  30. Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. AAAI 6:1419–1424

  31. Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F (2002) Mnm: ontology driven semi-automatic and automatic support for semantic markup. In: International conference on knowledge engineering and knowledge management, Springer, pp 379–391

  32. Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy. AAAI Press, Chicago, pp 25–30

  33. Yeh E, Ramage D, Manning CD, Agirre E, Soroa A (2009) Wikiwalk: random walks on wikipedia for semantic relatedness. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing, association for computational linguistics, pp 41–49

  34. Zesch T, Gurevych I (2007) Analysis of the wikipedia category graph for NLP applications. In: Proceedings of the TextGraphs-2 workshop (NAACL-HLT), Oxford

Download references

Author information

Correspondence to M. Atif Qureshi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Qureshi, M.A., Younus, A., O’Riordan, C. et al. A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management. J Ambient Intell Human Comput 9, 1403–1413 (2018). https://doi.org/10.1007/s12652-017-0536-y

Download citation


  • Online reputation management
  • Semantic relatedness
  • Wikipedia
  • Reputation dimensions