Skip to main content

Comparative Analysis and Implementation of Semantic-Based Classifiers

  • Conference paper
  • First Online:
  • 795 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11289))

Abstract

Text classifiers that extract their features with pure statistical methods are not very useful when there is an extended range of types to classify. They also lack a deeper understanding of the classified data. The use of some semantic methods can improve the efficiency and effectiveness of the purely quantitative approach. This work explores the use of a semantic approach based on a similarity measure to build a vector model containing some semantic evidence. This vector model is used to improve a Maximum Entropy-based text classifier. Experiments show that the F-measures obtained using this approach are competitive. One may conclude that the use of semantic analysis is an excellent complement to statistical approaches and produces better performance and high-grade results.

L. M. Escobar-Vega—The authors would like to thank the Instituto Tecnológico y de Estudios Superiores de Occidente (ITESO) of Mexico for the resources provided for this research. Also, the main author would like to thank the National Council of Science and Technology (CONACYT) of Mexico for the sponsoring of this research by the scholarship number 399053.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Altszyler, E., Sigman, M., Ribeiro, S., Slezak, D.: Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. Technical report. arXiv:1610.01520v2 [cs.CL], ArXiV, April 2017

  2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  3. Cohen, S.: Bayesian Analysis in Natural Language Processing, 1st edn. Morgan and Claypool, Toronto (2016)

    Google Scholar 

  4. Elekes, A., Schäler, M., Boehm, K.: On the various semantics of similarity in word embedding models. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 1–10. ACM/IEEE, June 2017

    Google Scholar 

  5. Finkel, J.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, vol. 1, no. 1, pp. 363–370, July 2005

    Google Scholar 

  6. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(1), 1289–1305 (2003)

    MATH  Google Scholar 

  7. GoodRelations: The most powerful Web vocabulary for e-commerce, 2 July 2018. http://wiki.goodrelations-vocabulary.org/Datasets

  8. Grenada, R.: Why coding and classifying products is critical to success in electronic commerce, 2 July 2018. https://www.unspsc.org/Portals/3/Documents/Why%20Coding%20and%20Classifying%20Products%20is%20Critical%20to%20Success%20in%20Electronic%20Commerce%20(October%202001).doc

  9. GRIAL-Projects: SenSem: Databank of Spanish sentences annotated syntactically and semantically, 17 February 2018. http://grial.uab.es/fproj.php?id=1&idioma=in.

  10. Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic Similarity from Natural Language and Ontology Analysis, 1st edn. Morgan and Claypool, Toronto (2017)

    Google Scholar 

  11. Harris, Z.: Distributional structure. Word 10(2), 146–162 (1954)

    Article  Google Scholar 

  12. Jurafsky, D., James, M.: Speech and language processing, 3rd edn. Prentice-Hall, Upper Saddle River (2017)

    Google Scholar 

  13. Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, vol. 1, no. 37, pp. 957–966 (2015)

    Google Scholar 

  14. Landauer, T., Dumais, S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)

    Article  Google Scholar 

  15. Landauer, T., Laham, D.: An introduction to latent semantic analysis. Discourse Process 25(1), 259–284 (1998)

    Article  Google Scholar 

  16. Lenci, A.: Distributional approaches in linguistic and cognitive research. Ital. J. Linguist. 20(1), 1–31 (2008)

    Google Scholar 

  17. Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666

    Chapter  Google Scholar 

  18. Li, Z., Ding, Q., Zhang, W.: A comparative study of different distances for similarity estimation. Intell. Comput. Inf. Sci. 134(1), 483–488 (2011)

    Google Scholar 

  19. Liu, B., Hsu, W., Ma, Y., Ma, B.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, vol. 32, no. 4, pp. 80–86 (1998)

    Google Scholar 

  20. Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. CoRR 1(1), 1–2, January 2013

    Google Scholar 

  21. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013 Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, no. 1, pp. 3111–3119, December 2013

    Google Scholar 

  22. Miller, G., Fellbaum, C.: Wordnet then and now. ECML 41(2), 209–214 (2007)

    Google Scholar 

  23. Mirończuk, M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106(1), 36–54 (2018)

    Article  Google Scholar 

  24. Nigam, K., Lafferty, J., Mccallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, vol. 1, no. 1, pp. 61–67, August 1999

    Google Scholar 

  25. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543. Association for Computational Linguistics, November 2014

    Google Scholar 

  26. Pushp, P., Srivastava, M.: Train once, test anywhere: zero-shot learning for text classification. Technical reporty. arXiv:1612.03651 [cs.CL], ArXiV, Dic 2017. https://arxiv.org/abs/1712.05972

  27. RAE: Real Academia Española (Jan 9 2018) http://www.rae.es

  28. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Mag. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  29. Scikit-learn: Scikit-learn Machine Learning in Python, 29 January 2018. http://scikit-learn.org/stable/

  30. Séaghdha, D.: Semantic classification with Wordnet kernels. ECML 37(1), 237–240 (2015)

    Google Scholar 

  31. SemEval: Multilingual and Cross-lingual Semantic Word Similarity, 5 January 2018. http://alt.qcri.org/semeval2017/task2/index.php?id=task-details

  32. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 1st edn. Cambridge, Cambridge (2004)

    Book  MATH  Google Scholar 

  33. Shevlyakov, G.: Robust Correlation: Theory and Applications, 1st edn. Wiley, West Sussex (2016)

    Book  MATH  Google Scholar 

  34. Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computacion Sistemas 18(3), 491–504 (2014)

    Google Scholar 

  35. Taule, M., Martí, A., Recasens, M.: Ancora: multilingual and multilevel annotated corpora. In: Proceedings of 6th International Conference on Language Resources and Evaluation, vol. 1, no. 1, pp. 96–101, January 2008

    Google Scholar 

  36. Tversky, A., Itamar, G.: Studies of similarity. Cogn. Categorization 84(4), 79–98 (1978)

    Google Scholar 

  37. UNSPCP: United Nations Standard Products and Services Code, 25 August 2017. https://www.unspsc.org

  38. Yao, X.: Semantic conceptual primitives computing in text classification. In: NAACL Short, vol. 15, no. 3, pp. 66–70 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Luis Miguel Escobar-Vega , Víctor Hugo Zaldívar-Carrillo or Ivan Villalon-Turrubiates .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Escobar-Vega, L.M., Zaldívar-Carrillo, V.H., Villalon-Turrubiates, I. (2018). Comparative Analysis and Implementation of Semantic-Based Classifiers. In: Batyrshin, I., Martínez-Villaseñor, M., Ponce Espinosa, H. (eds) Advances in Computational Intelligence. MICAI 2018. Lecture Notes in Computer Science(), vol 11289. Springer, Cham. https://doi.org/10.1007/978-3-030-04497-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04497-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04496-1

  • Online ISBN: 978-3-030-04497-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics