Solution Methods for Classification Problems with Categorical Attributes

D’yakonov, A. G.

doi:10.1007/s10598-015-9281-2

Solution Methods for Classification Problems with Categorical Attributes

Published: 26 May 2015

Volume 26, pages 408–428, (2015)
Cite this article

Computational Mathematics and Modeling Aims and scope Submit manuscript

A. G. D’yakonov¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor attributes), i.e., describe the membership of an object in a category. Some methods are a simple generalization of classical methods (Bayesian algorithms, singular decomposition methods), others are fundamentally novel. An efficient technique is proposed for encoding categorical attributes by real numbers, which makes it possible to apply classical machine-learning methods (e.g., the random forest). A generalization of the k nearest neighbors (kNN) algorithm and Zhuravlev’s estimate calculation algorithm (AEC) achieve best performance on real-life data. All methods have been tested on an applied problem involving construction of a recommender system for a security service.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

K. V. Vorontsov, Machine Learning [in Russian] [http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf].
The R Project for Statistical Computing [http://cran.r-project.org].
Y. Koren, R. M. Bell, C. Volinsky, “Matrix factorization techniques for recommender systems,” IEEE Computer, 42, No. 8, 30–37 (2009).
Article Google Scholar
Amazon.com — Employee Access Challenge, international competition on data analysis [http://www.kaggle.com/c/amazon-employee-access-challenge].
Library scikit-learn for Python [https://github.com/scikit-learn/scikit-learn].
T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, 27, 861–868 (2006).
Article Google Scholar
A. G. D’yakonov, “A theory of systems of equivalences for the descriptions of algebraic closures of the generalized estimate calculation model,” Zh. Vychisl. Matem. i Mat. Fiz., 50, No. 2, 388–400 (2010).
MATH MathSciNet Google Scholar
G. Strang, Linear Algebra and Its Applications, fourth edition, Thomson Brooks/Cole (2005).
C. D. Martin and M. A. Porter, “The extraordinary SVD,” American Mathematical Monthly, 119, No. 10, 838–851 (2012).
Article MATH MathSciNet Google Scholar
G. H. Golub and C. F. Van Loan, Matrix Computations, third edition, The Johns Hopkins University Press, Baltimore, MD, (1996).
MATH Google Scholar
T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, 51, No. 3, 455–500 (September 2009).
Article MATH MathSciNet Google Scholar
A Library for Large Linear Classification [http://www.csie.ntu.edu.tw/~cjlin/liblinear/].
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, “A dual coordinate descent method for large-scale linear SVM,” ICML (2008).
A. D’yakonov, “A blending of simple algorithms for topical classification,” in: Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, 7413/2012, 432–438 (2012) [http://www.springerlink.com/content/73g4kl50m6112420/].
K. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval [Russian translation], I. D. Vil’yams Publ., Moscow (2011).
Google Scholar
Yu. I. Zhuravlev, “An algebraic approach to recognition or classification problems,” Probl. Kibernet., No. 33, 5–68 (1978).
A. G. D’yakonov, “Predicting supermarket customer behavior by weighted schemes that estimate probabilities and densities,” Biznes-informatika (2014) (in press).
A. G. D’yakonov, “Two recommendation algorithms based on deformed linear combinations,” Proc. of ECML-PKDD 2011 Discovery Challenge Workshop (2011), pp 21–28.
S. Funk, “Netflix update: Try this at home,” [http://sifter.org/~simon/journal/20061211.html].
L. Breiman, “Random forests,” Machine Learning, 45, No. 1, 5–32 (2001).
Article MATH Google Scholar
WikiMart Olympiad – data analysis competition http://olymp.wikimart.ru.
S. Rendle, “Factorization machines with libFM,” ACM Trans. Intell. Syst. Technol., 3, No. 3, 57:1–57:22 (2012).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computational Mathematics and Cybernetics, Moscow State University, Moscow, Russia
A. G. D’yakonov

Authors

A. G. D’yakonov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. G. D’yakonov.

Additional information

Translated from Prikladnaya Matematika i Informatika, No. 46, 2014, pp. 103–127.

Rights and permissions

Reprints and permissions

About this article

Cite this article

D’yakonov, A.G. Solution Methods for Classification Problems with Categorical Attributes. Comput Math Model 26, 408–428 (2015). https://doi.org/10.1007/s10598-015-9281-2

Download citation

Published: 26 May 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10598-015-9281-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solution Methods for Classification Problems with Categorical Attributes

Access this article

Similar content being viewed by others

Data Mining Methods for Recommender Systems

Classification Through Data Mining Algorithm

A Multi-attribute Classification Method to Solve the Problem of Dimensionality

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Solution Methods for Classification Problems with Categorical Attributes

Access this article

Similar content being viewed by others

Data Mining Methods for Recommender Systems

Classification Through Data Mining Algorithm

A Multi-attribute Classification Method to Solve the Problem of Dimensionality

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation