A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Qureshi, M. Atif; Younus, Arjumand; O’Riordan, Colm; Pasi, Gabriella

doi:10.1007/s12652-017-0536-y

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Original Research
Published: 10 July 2017

Volume 9, pages 1403–1413, (2018)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

M. Atif Qureshi¹,
Arjumand Younus¹,
Colm O’Riordan² &
…
Gabriella Pasi³

454 Accesses
5 Citations
5 Altmetric
Explore all metrics

Abstract

Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts manually mine the social media repositories (in particular Twitter) for monitoring the reputation of a particular entity. Recently, the online reputation management evaluation campaign known as RepLab at CLEF has turned attention to devising computational methods for facilitating reputation management experts. A quite significant research challenge related to the above issue is to classify the reputation dimension of tweets with respect to entity names. More specifically, finding various aspects of a brand’s reputation is an important task which can help companies in monitoring areas of their strengths and weaknesses in an effective manner. To address this issue in this paper we use dominant Wikipedia categories related to a reputation dimension; the dominant Wikipedia categories are then utilised within a semantic relatedness scoring framework to generate “associativities” with respect to the various reputation dimensions, and another version of “associativity” normalized by the “content entropy” of Wikipedia categories. The Wikipedia categories obtained through our applied methods are finally used in a random forest classifier for the task of reputation dimensions classification. The experimental evaluations show a significant improvement over the baseline accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Wikipedia for Entity Name Disambiguation in Tweets

A Frequent Named Entities-Based Approach for Interpreting Reputation in Twitter

Article Open access 13 June 2018

Interpreting Reputation Through Frequent Named Entities in Twitter

Notes

In the context of reputation management, an entity may refer to a celebrity, company, organization or brand.
Note that these are the standard dimensions provided by the Reputation Institute.
Available in 270+ languages.
http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia.
http://en.wikipedia.org/wiki/Reliability_of_Wikipedia.
Note that we have essentially utilised the dumps made available by DBPedia. However, despite the fact that DBPedia contains a notable work of semantic annotations, we are not using this additional information.
musician1 and musician2 are two different musicians such as Madonna and Lady Gaga.
Microsoft is a company whereas Windows10 is a product of Microsoft.
It is this pre-defined entity corresponding to which reputation dimensions classification for the tweet has to be performed.
An infobox is a fixed-format table designed to be added to the top right-hand corner of Wikipedia articles to consistently present a summary of some unifying aspect pertaining to the articles.
It is important to note that a category representative of the entity is selected at this phase.
An example category taxonomy for Apple Inc. can be seen on left side of Fig. 2.
E.g., Wikipedia article “Steve Jobs” of “Apple Inc.” is mentioned inside a category “1955 births” which is not present either in parent nor in sub-categories of entity’s Wikipedia article.
Normalizing a subtle relationship may result into mathematical zero due to small fraction and storing a low fraction with high precision is not an efficient choice.
This could be a paragraph, sentence or tweet.
Number of words in a phrase.
Empirically this aggregation performs reasonably well during the evaluations as shown in the later chapters.
Recall from Sect. 4.1.1 that the final step in extraction of candidate phrases corresponds to matching with Wikipedia article titles.
From within training data.
From the set WikiCategories that represents all Wikipedia categories within a given reputation dimension.
Note that RD represents the set of all seven reputation dimensions.
http://gephi.github.io.
http://bit.ly/1eMADG9, we aim to release the API as an open source Wikipedia tool to facilitate other researchers.

References

Amigó E, De Albornoz JC, Chugur I, Corujo A, Gonzalo J, Martín T, Meij E, De Rijke M, Spina D (2013) Overview of RepLab 2013: evaluating online reputation monitoring systems. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 333–352
Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 307–322
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia—a crystallization point for the web of data. Web Semant Sci Serv Agents World Wide Web 7(3):154–165
Article Google Scholar
Clauson KA, Polen HH, Boulos MNK, Dzenowagis JH (2008) Scope, completeness, and accuracy of drug information in wikipedia. Ann Pharmacother 42(12):1814–1821
Article Google Scholar
De Maio C, Fenza G, Gallo M, Loia V, Senatore S (2014) Formal and relational concept analysis for fuzzy-based automatic semantic annotation. Appl Intell 40(1):154–177
Article Google Scholar
De Maio C, Fenza G, Loia V, Parente M (2016) Time aware knowledge extraction for microblog summarization on twitter. Inf Fusion 28:60–74
Article Google Scholar
Dellarocas C, Awad NF, Zhang XM (2003) Exploring the value of online reviews to organizations: implications for revenue forecasting and planning. Manag Sci 30:1407–1424
Fombrun C, Shanley M (1990) What’s in a name? reputation building and corporate strategy. Acad Manag J 33(2):233–258
Google Scholar
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611
Google Scholar
Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34(2):443
Article Google Scholar
Giles J (2005) Internet encyclopaedias go head to head. Nature 438(7070):900–901
Article Google Scholar
Glance N, Hurst M, Nigam K, Siegler M, Stockton R, Tomokiyo T (2005) Deriving marketing intelligence from online discussion. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05, pp 419–428
Handschuh S, Staab S, Ciravegna F (2002) S-cream—semi-automatic creation of metadata. In: International conference on knowledge engineering and knowledge management, Springer, pp 358–372
Hassan S, Mihalcea R (2011) Semantic relatedness using salient semantic analysis. In: AAAI, pp 884–889
Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web, ACM, pp 517–526
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 389–396
Hutton JG, Goodman MB, Alexander JB, Genest CM (2001) Reputation management: the new face of corporate public relations? Public Relat Rev 27(3):247–261
Article Google Scholar
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188
Article Google Scholar
Kiryakov A, Popov B, Terziev I, Manov D, Ognyanoff D (2004) Semantic annotation, indexing, and retrieval. Web Semant Sci Serv Agents World Wide Web 2(1):49–79
Article Google Scholar
Laclavik M, Šeleng M, Ciglan M, Hluchỳ L (2012) Ontea: platform for pattern based automated semantic annotation. Comput Inform 28(4):555–579
MATH Google Scholar
Leal JP, Rodrigues V, Queirós R (2012) Computing semantic relatedness using dbpedia. In: Symposium on languages, applications and technologies, 1st, Schloss Dagstuhl, pp 133–147
McDonald G, Deveaud R, McCreadie R, Macdonald C, Ounis I (2015) Tweet enrichment for effective dimensions classification in online reputation management. In: Ninth international AAAI conference on web and social media, Oxford
Miao Y, Li C (2010) Enhancing query-oriented summarization based on sentence wikification. In: Workshop of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, Oxford, p 32
Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, ACM, pp 233–242
Milne D, Witten IH (2013) An open-source toolkit for mining wikipedia. Artif Intell 194:222–239
Article MathSciNet Google Scholar
Milne DN (2010) Applying wikipedia to interactive information retrieval. PhD thesis, University of Waikato
Passant A (2010) Measuring semantic distance on linking data and using it for resources recommendations. In: AAAI spring symposium: linked data meets artificial intelligence, vol 77, p 123
Qureshi MA (2015) Utilising wikipedia for text mining applications. PhD thesis, NUI, Galway, Ireland
Rosenzweig R (2006) Can history be open source? Wikipedia and the future of the past. J Am Hist 93(1):117–146
Article Google Scholar
Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. AAAI 6:1419–1424
Google Scholar
Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F (2002) Mnm: ontology driven semi-automatic and automatic support for semantic markup. In: International conference on knowledge engineering and knowledge management, Springer, pp 379–391
Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy. AAAI Press, Chicago, pp 25–30
Yeh E, Ramage D, Manning CD, Agirre E, Soroa A (2009) Wikiwalk: random walks on wikipedia for semantic relatedness. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing, association for computational linguistics, pp 41–49
Zesch T, Gurevych I (2007) Analysis of the wikipedia category graph for NLP applications. In: Proceedings of the TextGraphs-2 workshop (NAACL-HLT), Oxford

Download references

Author information

Authors and Affiliations

Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
M. Atif Qureshi & Arjumand Younus
Information Technology Building, National University of Ireland, Galway, Ireland
Colm O’Riordan
Dipartimento di Informatica, Sistemistica e Comunicazione, Universita Degli Studi Di Milano-Bicocca, Milano, Italy
Gabriella Pasi

Authors

M. Atif Qureshi
View author publications
You can also search for this author in PubMed Google Scholar
Arjumand Younus
View author publications
You can also search for this author in PubMed Google Scholar
Colm O’Riordan
View author publications
You can also search for this author in PubMed Google Scholar
Gabriella Pasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Atif Qureshi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qureshi, M.A., Younus, A., O’Riordan, C. et al. A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management. J Ambient Intell Human Comput 9, 1403–1413 (2018). https://doi.org/10.1007/s12652-017-0536-y

Download citation

Received: 14 March 2017
Accepted: 22 June 2017
Published: 10 July 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s12652-017-0536-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Abstract

Access this article

Similar content being viewed by others

Exploiting Wikipedia for Entity Name Disambiguation in Tweets

A Frequent Named Entities-Based Approach for Interpreting Reputation in Twitter

Interpreting Reputation Through Frequent Named Entities in Twitter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management

Abstract

Access this article

Similar content being viewed by others

Exploiting Wikipedia for Entity Name Disambiguation in Tweets

A Frequent Named Entities-Based Approach for Interpreting Reputation in Twitter

Interpreting Reputation Through Frequent Named Entities in Twitter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation