Semantic Topic Compass – Classification Based on Unsupervised Feature Ambiguity Gradation

  • Amparo Elizabeth CanoEmail author
  • Hassan Saif
  • Harith Alani
  • Enrico Motta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


Characterising social media topics often requires new features to be continuously taken into account, and thus increasing the need for classifier retraining. One challenging aspect is the emergence of ambiguous features, which can affect classification performance. In this paper we investigate the impact of the use of ambiguous features in a topic classification task, and introduce the Semantic Topic Compass (STC) framework, which characterises ambiguity in a topics feature space. STC makes use of topic priors derived from structured knowledge sources to facilitate the semantic feature grading of a topic. Our findings demonstrate the proposed framework offers competitive boosts in performance across all datasets.


Topic classification Feature engineering Semantics 



This work was supported by the EU-FP7 project SENSE4US (grant no. 611242) and the UK HEFCE project MK:Smart.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Cano, A.E., He, Y., Alani, H.: Stretching the life of twitter classifiers with time-stamped semantic graphs. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 341–357. Springer, Heidelberg (2014)Google Scholar
  3. 3.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)zbMATHGoogle Scholar
  4. 4.
    Firth, J.R.: A Synopsis of Linguistic Theory. Studies in Linguistic Analysis (1930–1955) (1957)Google Scholar
  5. 5.
    Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS, vol. 6780, pp. 484–492. Springer, Heidelberg (2011)Google Scholar
  7. 7.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  8. 8.
    How, B.C., Narayanan, K.: An empirical study of feature selection for text categorization based on term weightage. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 599–602. IEEE Computer Society (2004)Google Scholar
  9. 9.
    Janecek, A., Gansterer, W.N., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. J. Mach. Learn. Res. 4, 90–105 (2008)Google Scholar
  10. 10.
    Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 251–258. IEEE (2011)Google Scholar
  11. 11.
    McCreadie, R., Macdonald, C., Ounis, I., Osborne, M., Petrovic, S.: Scalable distributed event detection for twitter. In: Proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, pp. 543–549, 6–9 October 2013Google Scholar
  12. 12.
    Mengle, S.S., Goharian, N.: Ambiguity measure feature-selection algorithm. J. Am. Soc. Inf. Sci. Technol. 60(5), 1037–1050 (2009)CrossRefGoogle Scholar
  13. 13.
    Mladeni’c, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Text and the Web, Conference on Automated Learning and Discovery CONALD-98. Citeseer (1998)Google Scholar
  14. 14.
    Muñoz-García, O., García-Silva, A., Corcho, O., Higuera Hernández, M., Navarro, C.: Identifying topics in social media posts using dbpedia (2011)Google Scholar
  15. 15.
    Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)Google Scholar
  16. 16.
    Saif, H., Fernandez, M., He, Y., Alani, H.: SentiCircles for contextual and conceptual semantic sentiment analysis of twitter. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 83–98. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  17. 17.
    Saif, H., He, Y., Fernandez, M., Alani, H.: Semantic patterns for sentiment analysis of twitter. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 324–340. Springer, Heidelberg (2014)Google Scholar
  18. 18.
    Sheth, A., Ramakrishnan, C., Thomas, C.: Semantics for the Semantic Web. Idea Group Publishing, p. 1 (2005)Google Scholar
  19. 19.
    Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. IJCAI 3, 2330–2336 (2011)Google Scholar
  20. 20.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842. ACM (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Amparo Elizabeth Cano
    • 1
    Email author
  • Hassan Saif
    • 1
  • Harith Alani
    • 1
  • Enrico Motta
    • 1
  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUK

Personalised recommendations