Advertisement

Language Resources and Evaluation

, Volume 52, Issue 1, pp 185–215 | Cite as

A flexible text analyzer based on ontologies: an application for detecting discriminatory language

  • Alberto Salguero
  • Macarena Espinilla
Original Paper

Abstract

Language can be a tool to marginalize certain groups due to the fact that it may reflect a negative mentality caused by mental barriers or historical delays. In order to prevent misuse of language, several agents have carried out campaigns against discriminatory language, criticizing the use of some terms and phrases. However, there is an important gap in detecting discriminatory text in documents because language is very flexible and, usually, contains hidden features or relations. Furthermore, the adaptation of approaches and methodologies proposed in the literature for text analysis is complex due to the fact that these proposals are too rigid to be adapted to different purposes for which they were intended. The main novelty of the methodology is the use of ontologies to implement the rules that are used by the developed text analyzer, providing a great flexibility for the development of text analyzers and exploiting the ability to infer knowledge of the ontologies. A set of rules for detecting discriminatory language relevant to gender and people with disabilities is also presented in order to show how to extend the functionality of the text analyzer to different discriminatory text areas.

Keywords

Text analyzer Document text model Methodology Ontology Discriminatory language 

Notes

Acknowledgements

This contribution has been supported by the Andalusian Institute of Women, Junta de Andalucía, Spain (Grant No. UNIVER09/2009/23/00).

References

  1. Ahmed, S. (2007). The language of diversity. Ethnic and Racial Studies, 30(2), 235–256.CrossRefGoogle Scholar
  2. Alfonseca, E., Garrido, G., Delort, J. Y., & Peńas, A. (2013). Whad: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the wikipedia edit history. Language Resources and Evaluation, 47(4), 1163–1190.CrossRefGoogle Scholar
  3. Augoustinos, M., Tuffin, K., & Every, D. (2005). New racism, meritocracy and individualism: Constraining affirmative action in education. Discourse and Society, 16(3), 315–340.CrossRefGoogle Scholar
  4. Aussenac-Gilles, N., & Sörgel, D. (2005). Text analysis for ontology and terminology engineering. Applied Ontology, 1(1), 35–46.Google Scholar
  5. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol, CA: O’Reilly Media, Inc.Google Scholar
  6. Brading, J., & Curtis, J. (2000). Disability discrimination: A practical guide to the new law. London: Kogan Page Series.Google Scholar
  7. Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied natural language processing, association for computational linguistics, Stroudsburg, PA, USA, ANLC ’92, pp. 152–155. doi: 10.3115/974499.974526.
  8. Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protégé plug-in for ontology extraction from text based on linguistic analysis. In The semantic web: Research and applications, pp. 31–44. Springer.Google Scholar
  9. Chandrasekaran, B., Josephson, J., & Benjamins, V. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems and Their Applications, 14(1), 20–26.CrossRefGoogle Scholar
  10. Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012, pp. 71–80.Google Scholar
  11. Chin, S., Street, W., Srinivasan, P., & Eichmann, D. (2010). Detecting wikipedia vandalism with active learning and statistical language models. In Proceedings of the 4th workshop on information credibility, WICOW’10, pp. 3–10.Google Scholar
  12. Cimiano, P., McCrae, J., & Buitelaar, P. (2016). Lexicon model for ontologies: Community report. https://www.w3.org/2016/05/ontolex/. Accessed 12 July 2016.
  13. Claude, R., & Weston, B. (1992). Human rights in the world community: Issues and action. Pennsylvania: University of Pennsylvania Press.Google Scholar
  14. Colker, R., & Milani, A. (2012). The law of disability discrimination handbook: Statutes and regulatory guidance. New York, NY: LexisNexis.Google Scholar
  15. Dance, F. (1970). The concept of communication. Journal of Communication, 20(2), 201–210.CrossRefGoogle Scholar
  16. Drummond, N., Rector, A., Stevens, R., Moulton, G., Horridge, M., Wang, H., & Seidenberg, J. (2006). Putting owl in order: Patterns for sequences in owl. In OWLED.Google Scholar
  17. Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Computing semantic relatedness using wikipedia-based explicit semantic analysis. pp. 1606–1611.Google Scholar
  18. Gangemi, A., Navigli, R., & Velardi, P. (2003). The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet. In The OntoWordNet project: Extension and axiomatization of conceptual relations in WordNet, Vol. 2888, pp. 820–838. Springer.Google Scholar
  19. Garla, V., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of Biomedical Informatics, 45(5), 992–998.CrossRefGoogle Scholar
  20. Hayes, P. J., & Patel-Schneide, P. F. (2014). Rdf 1.1 semantics. https://www.w3.org/TR/rdf11-mt/. Accessed 18 March 2016.
  21. Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics-Volume 2, Association for Computational Linguistics, pp. 539–545.Google Scholar
  22. Hellmann, S., Lehmann, J., Auer, S., & Brümmer, M. (2013). Integrating NLP using linked data. In International semantic web conference, pp. 98–113. Springer.Google Scholar
  23. Horrocks, I. (2008). Ontologies and the semantic web. Communications of the ACM, 51(12), 58–67.CrossRefGoogle Scholar
  24. Horrocks, I., Patel-Schneider, P., & Van Harmelen, F. (2003). From SHIQ and RDF to OWL: The making of a web ontology language. Web Semantics, 1(1), 7–26.CrossRefGoogle Scholar
  25. Hotho, A., Maedche, A., & Staab, S. (2002). Ontology-based text document clustering. KI, 16(4), 48–54.Google Scholar
  26. Isaac, A., & Summers, E. (2009). Skos simple knowledge organization system primer. w3c recommendation. Technical Report, World Wide Web Consortium (W3C).Google Scholar
  27. Kasper, W., & Vela, M. (2012). Sentiment analysis for hotel reviews. Speech Technology, 4(2), 96–109.Google Scholar
  28. Knijff, J., Frasincar, F., & Hogenboom, F. (2013). Domain taxonomy learning from text: The subsumption method versus hierarchical clustering. Data & Knowledge Engineering, 83, 54–69. doi: 10.1016/j.datak.2012.10.002.CrossRefGoogle Scholar
  29. Kohler, J., Philippi, S., Specht, M., & Ruegg, A. (2006). Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19(8), 744–754.CrossRefGoogle Scholar
  30. Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074.CrossRefGoogle Scholar
  31. Kontostathis, A., Edwards, L., & Leatherman, A. (2009). Chatcoder: Toward the tracking and categorization of internet predators. In Society for industrial and applied mathematics—9th SIAM international conference on data mining 2009, Proceedings in applied mathematics, Vol 3. pp. 1327–1334.Google Scholar
  32. Kubota, R., & Lin, A. (2010). Race, culture, and identities in second language education: Exploring critically engaged practice. New York: Taylor & Francis.Google Scholar
  33. Li, C., Yang, J., & Park, S. (2012). Text categorization algorithms using semantic approaches, corpus-based thesaurus and wordnet. Expert Systems with Applications, 39(1), 765–772.CrossRefGoogle Scholar
  34. Litosseliti, L. (2014). Gender and language theory and practice. New York: Taylor & Francis.Google Scholar
  35. Loenen, T., & Rodrigues, P. (1999). Non-discrimination law: Comparative perspectives. Alphen aan den Rijn: Kluwer Law International.Google Scholar
  36. Luo, Q., Chen, E., & Xiong, H. (2011). A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10), 12,708–12,716.CrossRefGoogle Scholar
  37. Machhour, H., & Kassou, I. (2013). Improving text categorization: A fully automated ontology based approach. In 2013 Third international conference on communications and information technology (ICCIT), IEEE, pp. 67–72.Google Scholar
  38. Maedche, A., & Staab, S. (2001). Ontology learning for the semantic web. IEEE Intelligent Systems and Their Applications, 16(2), 72–79.CrossRefGoogle Scholar
  39. McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.CrossRefGoogle Scholar
  40. Mowbray, J. (2012). Linguistic justice: International law and language policy. Oxford: OUP.CrossRefGoogle Scholar
  41. ODP. (2010). Owl list pattern. http://ontologydesignpatterns.org/wiki/Submissions:List. Accessed 18 May 2016.
  42. Orelus, P. (2011). Rethinking race, class, language, and gender: A dialogue with noam chomsky and other leading scholars. Lanham, MD: Rowman & Littlefield Publishers.Google Scholar
  43. Salguero, A., & Espinilla, M. (2016). Description logic class expression learning applied to sentiment analysis. Cham: Springer. doi: 10.1007/978-3-319-30319-2_5.CrossRefGoogle Scholar
  44. Santorini, B. (1990). Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Report, University of Pennsylvania.Google Scholar
  45. Schiek, D., & Lawson, A. (2011). European union non-discrimination law and intersectionality: Investigating the triangle of racial, gender and disability discrimination. Farnham: Ashgate.Google Scholar
  46. Shuy, R. W. (2007). Fighting over words: Language and civil law cases: Language and civil law cases. Oxford: Oxford University Press.Google Scholar
  47. Sirin, E., Parsia, B., Grau, B., Kalyanpur, A., & Katz, Y. (2007). Pellet: A practical owl-dl reasoner. Web Semantics, 5(2), 51–53.CrossRefGoogle Scholar
  48. Tablan, V., Bontcheva, K., Roberts, I., & Cunningham, H. (2015). Mímir: An open-source semantic search framework for interactive information seeking and discovery. Web Semantics: Science, Services and Agents on the World Wide Web, 30, 52–68. doi: 10.1016/j.websem.2014.10.002 http://www.sciencedirect.com/science/article/pii/S1570826814001036, semantic Search.
  49. Talbot, M. (2010). Language and gender. New York: Wiley.Google Scholar
  50. Tontti, J. (2004). Right and prejudice: Prolegomena to a hermeneutical philosophy of law. Farnham: Ashgate.Google Scholar
  51. University of Newcastle. (2006). Inclusive language policy 000797. http://www.newcastle.edu.au/policy/000797.html.
  52. Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.CrossRefGoogle Scholar
  53. Uschold, M., Gruninger, M., et al. (1996). Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2), 93–136.CrossRefGoogle Scholar
  54. Wang, P., Hu, H. J. J. Z., & Chen, Z. (2009). Using wikipedia knowledge to improve text classification. Knowledge and Information Systems, 19(3), 265–281.CrossRefGoogle Scholar
  55. Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using wordnet and lexical chains. Expert Systems with Applications, 42(4), 2264–2275. doi: 10.1016/j.eswa.2014.10.023.CrossRefGoogle Scholar
  56. Weller, P., Purdam, K., Ghanea, N., & Cheruvallil-Contractor, S. (2013). Religion or belief, discrimination and equality: britain in global contexts. London: Bloomsbury Publishing.Google Scholar
  57. Xu, H., Zhang, F., & Wang, W. (2015). Implicit feature identification in chinese reviews using explicit topic mining model. Knowledge-Based Systems, 76, 166–175. doi: 10.1016/j.knosys.2014.12.012.CrossRefGoogle Scholar
  58. Yates, S. (2001). Gender, language and CMC for education. Learning and Instruction, 11(1), 21–34.CrossRefGoogle Scholar
  59. Zhang, F., Ma, Z., & Li, W. (2015). Storing owl ontologies in object-oriented databases. Knowledge-Based Systems, 76, 240–255. doi: 10.1016/j.knosys.2014.12.020.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Department of Computer SciencesUniversity of CádizCádizSpain
  2. 2.Department of Computer SciencesUniversity of JaénJaénSpain

Personalised recommendations