Skip to main content
Log in

Solution Methods for Classification Problems with Categorical Attributes

  • Published:
Computational Mathematics and Modeling Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor attributes), i.e., describe the membership of an object in a category. Some methods are a simple generalization of classical methods (Bayesian algorithms, singular decomposition methods), others are fundamentally novel. An efficient technique is proposed for encoding categorical attributes by real numbers, which makes it possible to apply classical machine-learning methods (e.g., the random forest). A generalization of the k nearest neighbors (kNN) algorithm and Zhuravlev’s estimate calculation algorithm (AEC) achieve best performance on real-life data. All methods have been tested on an applied problem involving construction of a recommender system for a security service.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. V. Vorontsov, Machine Learning [in Russian] [http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf].

  2. The R Project for Statistical Computing [http://cran.r-project.org].

  3. Y. Koren, R. M. Bell, C. Volinsky, “Matrix factorization techniques for recommender systems,” IEEE Computer, 42, No. 8, 30–37 (2009).

    Article  Google Scholar 

  4. Amazon.com — Employee Access Challenge, international competition on data analysis [http://www.kaggle.com/c/amazon-employee-access-challenge].

  5. Library scikit-learn for Python [https://github.com/scikit-learn/scikit-learn].

  6. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, 27, 861–868 (2006).

    Article  Google Scholar 

  7. A. G. D’yakonov, “A theory of systems of equivalences for the descriptions of algebraic closures of the generalized estimate calculation model,” Zh. Vychisl. Matem. i Mat. Fiz., 50, No. 2, 388–400 (2010).

    MATH  MathSciNet  Google Scholar 

  8. G. Strang, Linear Algebra and Its Applications, fourth edition, Thomson Brooks/Cole (2005).

  9. C. D. Martin and M. A. Porter, “The extraordinary SVD,” American Mathematical Monthly, 119, No. 10, 838–851 (2012).

    Article  MATH  MathSciNet  Google Scholar 

  10. G. H. Golub and C. F. Van Loan, Matrix Computations, third edition, The Johns Hopkins University Press, Baltimore, MD, (1996).

    MATH  Google Scholar 

  11. T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, 51, No. 3, 455–500 (September 2009).

    Article  MATH  MathSciNet  Google Scholar 

  12. A Library for Large Linear Classification [http://www.csie.ntu.edu.tw/~cjlin/liblinear/].

  13. C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, “A dual coordinate descent method for large-scale linear SVM,” ICML (2008).

  14. A. D’yakonov, “A blending of simple algorithms for topical classification,” in: Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, 7413/2012, 432–438 (2012) [http://www.springerlink.com/content/73g4kl50m6112420/].

  15. K. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval [Russian translation], I. D. Vil’yams Publ., Moscow (2011).

    Google Scholar 

  16. Yu. I. Zhuravlev, “An algebraic approach to recognition or classification problems,” Probl. Kibernet., No. 33, 5–68 (1978).

  17. A. G. D’yakonov, “Predicting supermarket customer behavior by weighted schemes that estimate probabilities and densities,” Biznes-informatika (2014) (in press).

  18. A. G. D’yakonov, “Two recommendation algorithms based on deformed linear combinations,” Proc. of ECML-PKDD 2011 Discovery Challenge Workshop (2011), pp 21–28.

  19. S. Funk, “Netflix update: Try this at home,” [http://sifter.org/~simon/journal/20061211.html].

  20. L. Breiman, “Random forests,” Machine Learning, 45, No. 1, 5–32 (2001).

    Article  MATH  Google Scholar 

  21. WikiMart Olympiad – data analysis competition http://olymp.wikimart.ru.

  22. S. Rendle, “Factorization machines with libFM,” ACM Trans. Intell. Syst. Technol., 3, No. 3, 57:1–57:22 (2012).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. G. D’yakonov.

Additional information

Translated from Prikladnaya Matematika i Informatika, No. 46, 2014, pp. 103–127.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

D’yakonov, A.G. Solution Methods for Classification Problems with Categorical Attributes. Comput Math Model 26, 408–428 (2015). https://doi.org/10.1007/s10598-015-9281-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10598-015-9281-2

Keywords

Navigation