Comparative Analysis and Implementation of Semantic-Based Classifiers

Escobar-Vega, Luis Miguel; Zaldívar-Carrillo, Víctor Hugo; Villalon-Turrubiates, Ivan

doi:10.1007/978-3-030-04497-8_7

Comparative Analysis and Implementation of Semantic-Based Classifiers

Luis Miguel Escobar-Vega¹⁵,
Víctor Hugo Zaldívar-Carrillo¹⁵ &
Ivan Villalon-Turrubiates¹⁵

Conference paper
First Online: 03 January 2019

795 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11289))

Abstract

Text classifiers that extract their features with pure statistical methods are not very useful when there is an extended range of types to classify. They also lack a deeper understanding of the classified data. The use of some semantic methods can improve the efficiency and effectiveness of the purely quantitative approach. This work explores the use of a semantic approach based on a similarity measure to build a vector model containing some semantic evidence. This vector model is used to improve a Maximum Entropy-based text classifier. Experiments show that the F-measures obtained using this approach are competitive. One may conclude that the use of semantic analysis is an excellent complement to statistical approaches and produces better performance and high-grade results.

L. M. Escobar-Vega—The authors would like to thank the Instituto Tecnológico y de Estudios Superiores de Occidente (ITESO) of Mexico for the resources provided for this research. Also, the main author would like to thank the National Council of Science and Technology (CONACYT) of Mexico for the sponsoring of this research by the scholarship number 399053.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Altszyler, E., Sigman, M., Ribeiro, S., Slezak, D.: Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. Technical report. arXiv:1610.01520v2 [cs.CL], ArXiV, April 2017
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Cohen, S.: Bayesian Analysis in Natural Language Processing, 1st edn. Morgan and Claypool, Toronto (2016)
Google Scholar
Elekes, A., Schäler, M., Boehm, K.: On the various semantics of similarity in word embedding models. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 1–10. ACM/IEEE, June 2017
Google Scholar
Finkel, J.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, vol. 1, no. 1, pp. 363–370, July 2005
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(1), 1289–1305 (2003)
MATH Google Scholar
GoodRelations: The most powerful Web vocabulary for e-commerce, 2 July 2018. http://wiki.goodrelations-vocabulary.org/Datasets
Grenada, R.: Why coding and classifying products is critical to success in electronic commerce, 2 July 2018. https://www.unspsc.org/Portals/3/Documents/Why%20Coding%20and%20Classifying%20Products%20is%20Critical%20to%20Success%20in%20Electronic%20Commerce%20(October%202001).doc
GRIAL-Projects: SenSem: Databank of Spanish sentences annotated syntactically and semantically, 17 February 2018. http://grial.uab.es/fproj.php?id=1&idioma=in.
Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic Similarity from Natural Language and Ontology Analysis, 1st edn. Morgan and Claypool, Toronto (2017)
Google Scholar
Harris, Z.: Distributional structure. Word 10(2), 146–162 (1954)
Article Google Scholar
Jurafsky, D., James, M.: Speech and language processing, 3rd edn. Prentice-Hall, Upper Saddle River (2017)
Google Scholar
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, vol. 1, no. 37, pp. 957–966 (2015)
Google Scholar
Landauer, T., Dumais, S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Article Google Scholar
Landauer, T., Laham, D.: An introduction to latent semantic analysis. Discourse Process 25(1), 259–284 (1998)
Article Google Scholar
Lenci, A.: Distributional approaches in linguistic and cognitive research. Ital. J. Linguist. 20(1), 1–31 (2008)
Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Chapter Google Scholar
Li, Z., Ding, Q., Zhang, W.: A comparative study of different distances for similarity estimation. Intell. Comput. Inf. Sci. 134(1), 483–488 (2011)
Google Scholar
Liu, B., Hsu, W., Ma, Y., Ma, B.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, vol. 32, no. 4, pp. 80–86 (1998)
Google Scholar
Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. CoRR 1(1), 1–2, January 2013
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013 Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, no. 1, pp. 3111–3119, December 2013
Google Scholar
Miller, G., Fellbaum, C.: Wordnet then and now. ECML 41(2), 209–214 (2007)
Google Scholar
Mirończuk, M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106(1), 36–54 (2018)
Article Google Scholar
Nigam, K., Lafferty, J., Mccallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, vol. 1, no. 1, pp. 61–67, August 1999
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543. Association for Computational Linguistics, November 2014
Google Scholar
Pushp, P., Srivastava, M.: Train once, test anywhere: zero-shot learning for text classification. Technical reporty. arXiv:1612.03651 [cs.CL], ArXiV, Dic 2017. https://arxiv.org/abs/1712.05972
RAE: Real Academia Española (Jan 9 2018) http://www.rae.es
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Mag. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Scikit-learn: Scikit-learn Machine Learning in Python, 29 January 2018. http://scikit-learn.org/stable/
Séaghdha, D.: Semantic classification with Wordnet kernels. ECML 37(1), 237–240 (2015)
Google Scholar
SemEval: Multilingual and Cross-lingual Semantic Word Similarity, 5 January 2018. http://alt.qcri.org/semeval2017/task2/index.php?id=task-details
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 1st edn. Cambridge, Cambridge (2004)
Book MATH Google Scholar
Shevlyakov, G.: Robust Correlation: Theory and Applications, 1st edn. Wiley, West Sussex (2016)
Book MATH Google Scholar
Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computacion Sistemas 18(3), 491–504 (2014)
Google Scholar
Taule, M., Martí, A., Recasens, M.: Ancora: multilingual and multilevel annotated corpora. In: Proceedings of 6th International Conference on Language Resources and Evaluation, vol. 1, no. 1, pp. 96–101, January 2008
Google Scholar
Tversky, A., Itamar, G.: Studies of similarity. Cogn. Categorization 84(4), 79–98 (1978)
Google Scholar
UNSPCP: United Nations Standard Products and Services Code, 25 August 2017. https://www.unspsc.org
Yao, X.: Semantic conceptual primitives computing in text classification. In: NAACL Short, vol. 15, no. 3, pp. 66–70 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

ITESO (Instituto Tecnológico y de Estudios Superiores de Occidente), 45604, Tlaquepaque, Mexico
Luis Miguel Escobar-Vega, Víctor Hugo Zaldívar-Carrillo & Ivan Villalon-Turrubiates

Authors

Luis Miguel Escobar-Vega
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Hugo Zaldívar-Carrillo
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Villalon-Turrubiates
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Luis Miguel Escobar-Vega , Víctor Hugo Zaldívar-Carrillo or Ivan Villalon-Turrubiates .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Ildar Batyrshin
Universidad Panamericana, Mexico City, Mexico
María de Lourdes Martínez-Villaseñor
Faculty of Engineering, Universidad Panamericana, Mexico City, Mexico
Hiram Eredín Ponce Espinosa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Escobar-Vega, L.M., Zaldívar-Carrillo, V.H., Villalon-Turrubiates, I. (2018). Comparative Analysis and Implementation of Semantic-Based Classifiers. In: Batyrshin, I., Martínez-Villaseñor, M., Ponce Espinosa, H. (eds) Advances in Computational Intelligence. MICAI 2018. Lecture Notes in Computer Science(), vol 11289. Springer, Cham. https://doi.org/10.1007/978-3-030-04497-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-04497-8_7
Published: 03 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04496-1
Online ISBN: 978-3-030-04497-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics