Skip to main content
Log in

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts manually mine the social media repositories (in particular Twitter) for monitoring the reputation of a particular entity. Recently, the online reputation management evaluation campaign known as RepLab at CLEF has turned attention to devising computational methods for facilitating reputation management experts. A quite significant research challenge related to the above issue is to classify the reputation dimension of tweets with respect to entity names. More specifically, finding various aspects of a brand’s reputation is an important task which can help companies in monitoring areas of their strengths and weaknesses in an effective manner. To address this issue in this paper we use dominant Wikipedia categories related to a reputation dimension; the dominant Wikipedia categories are then utilised within a semantic relatedness scoring framework to generate “associativities” with respect to the various reputation dimensions, and another version of “associativity” normalized by the “content entropy” of Wikipedia categories. The Wikipedia categories obtained through our applied methods are finally used in a random forest classifier for the task of reputation dimensions classification. The experimental evaluations show a significant improvement over the baseline accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the context of reputation management, an entity may refer to a celebrity, company, organization or brand.

  2. Note that these are the standard dimensions provided by the Reputation Institute.

  3. Available in 270+ languages.

  4. http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia.

  5. http://en.wikipedia.org/wiki/Reliability_of_Wikipedia.

  6. Note that we have essentially utilised the dumps made available by DBPedia. However, despite the fact that DBPedia contains a notable work of semantic annotations, we are not using this additional information.

  7. musician1 and musician2 are two different musicians such as Madonna and Lady Gaga.

  8. Microsoft is a company whereas Windows10 is a product of Microsoft.

  9. It is this pre-defined entity corresponding to which reputation dimensions classification for the tweet has to be performed.

  10. An infobox is a fixed-format table designed to be added to the top right-hand corner of Wikipedia articles to consistently present a summary of some unifying aspect pertaining to the articles.

  11. It is important to note that a category representative of the entity is selected at this phase.

  12. An example category taxonomy for Apple Inc. can be seen on left side of Fig. 2.

  13. E.g., Wikipedia article “Steve Jobs” of “Apple Inc.” is mentioned inside a category “1955 births” which is not present either in parent nor in sub-categories of entity’s Wikipedia article.

  14. Normalizing a subtle relationship may result into mathematical zero due to small fraction and storing a low fraction with high precision is not an efficient choice.

  15. This could be a paragraph, sentence or tweet.

  16. Number of words in a phrase.

  17. Empirically this aggregation performs reasonably well during the evaluations as shown in the later chapters.

  18. Recall from Sect. 4.1.1 that the final step in extraction of candidate phrases corresponds to matching with Wikipedia article titles.

  19. From within training data.

  20. From the set WikiCategories that represents all Wikipedia categories within a given reputation dimension.

  21. Note that RD represents the set of all seven reputation dimensions.

  22. http://gephi.github.io.

  23. http://bit.ly/1eMADG9, we aim to release the API as an open source Wikipedia tool to facilitate other researchers.

References

  • Amigó E, De Albornoz JC, Chugur I, Corujo A, Gonzalo J, Martín T, Meij E, De Rijke M, Spina D (2013) Overview of RepLab 2013: evaluating online reputation monitoring systems. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 333–352

  • Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 307–322

  • Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia—a crystallization point for the web of data. Web Semant Sci Serv Agents World Wide Web 7(3):154–165

    Article  Google Scholar 

  • Clauson KA, Polen HH, Boulos MNK, Dzenowagis JH (2008) Scope, completeness, and accuracy of drug information in wikipedia. Ann Pharmacother 42(12):1814–1821

    Article  Google Scholar 

  • De Maio C, Fenza G, Gallo M, Loia V, Senatore S (2014) Formal and relational concept analysis for fuzzy-based automatic semantic annotation. Appl Intell 40(1):154–177

    Article  Google Scholar 

  • De Maio C, Fenza G, Loia V, Parente M (2016) Time aware knowledge extraction for microblog summarization on twitter. Inf Fusion 28:60–74

    Article  Google Scholar 

  • Dellarocas C, Awad NF, Zhang XM (2003) Exploring the value of online reviews to organizations: implications for revenue forecasting and planning. Manag Sci 30:1407–1424

  • Fombrun C, Shanley M (1990) What’s in a name? reputation building and corporate strategy. Acad Manag J 33(2):233–258

    Google Scholar 

  • Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611

    Google Scholar 

  • Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34(2):443

    Article  Google Scholar 

  • Giles J (2005) Internet encyclopaedias go head to head. Nature 438(7070):900–901

    Article  Google Scholar 

  • Glance N, Hurst M, Nigam K, Siegler M, Stockton R, Tomokiyo T (2005) Deriving marketing intelligence from online discussion. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05, pp 419–428

  • Handschuh S, Staab S, Ciravegna F (2002) S-cream—semi-automatic creation of metadata. In: International conference on knowledge engineering and knowledge management, Springer, pp 358–372

  • Hassan S, Mihalcea R (2011) Semantic relatedness using salient semantic analysis. In: AAAI, pp 884–889

  • Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web, ACM, pp 517–526

  • Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 389–396

  • Hutton JG, Goodman MB, Alexander JB, Genest CM (2001) Reputation management: the new face of corporate public relations? Public Relat Rev 27(3):247–261

    Article  Google Scholar 

  • Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188

    Article  Google Scholar 

  • Kiryakov A, Popov B, Terziev I, Manov D, Ognyanoff D (2004) Semantic annotation, indexing, and retrieval. Web Semant Sci Serv Agents World Wide Web 2(1):49–79

    Article  Google Scholar 

  • Laclavik M, Šeleng M, Ciglan M, Hluchỳ L (2012) Ontea: platform for pattern based automated semantic annotation. Comput Inform 28(4):555–579

    MATH  Google Scholar 

  • Leal JP, Rodrigues V, Queirós R (2012) Computing semantic relatedness using dbpedia. In: Symposium on languages, applications and technologies, 1st, Schloss Dagstuhl, pp 133–147

  • McDonald G, Deveaud R, McCreadie R, Macdonald C, Ounis I (2015) Tweet enrichment for effective dimensions classification in online reputation management. In: Ninth international AAAI conference on web and social media, Oxford

  • Miao Y, Li C (2010) Enhancing query-oriented summarization based on sentence wikification. In: Workshop of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, Oxford, p 32

  • Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, pp 233–242

  • Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239

    Article  MathSciNet  Google Scholar 

  • Milne DN (2010) Applying wikipedia to interactive information retrieval. PhD thesis, University of Waikato

  • Passant A (2010) Measuring semantic distance on linking data and using it for resources recommendations. In: AAAI spring symposium: linked data meets artificial intelligence, vol 77, p 123

  • Qureshi MA (2015) Utilising wikipedia for text mining applications. PhD thesis, NUI, Galway, Ireland

  • Rosenzweig R (2006) Can history be open source? Wikipedia and the future of the past. J Am Hist 93(1):117–146

    Article  Google Scholar 

  • Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. AAAI 6:1419–1424

    Google Scholar 

  • Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F (2002) Mnm: ontology driven semi-automatic and automatic support for semantic markup. In: International conference on knowledge engineering and knowledge management, Springer, pp 379–391

  • Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy. AAAI Press, Chicago, pp 25–30

  • Yeh E, Ramage D, Manning CD, Agirre E, Soroa A (2009) Wikiwalk: random walks on wikipedia for semantic relatedness. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing, association for computational linguistics, pp 41–49

  • Zesch T, Gurevych I (2007) Analysis of the wikipedia category graph for NLP applications. In: Proceedings of the TextGraphs-2 workshop (NAACL-HLT), Oxford

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Atif Qureshi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qureshi, M.A., Younus, A., O’Riordan, C. et al. A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management. J Ambient Intell Human Comput 9, 1403–1413 (2018). https://doi.org/10.1007/s12652-017-0536-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0536-y

Keywords

Navigation