Abstract
This chapter is concerned with the raw data types, which are in the status before encoding them into numerical vectors. The relational data that consists of records is the typical raw data type, and it is relatively easy to encode them into numerical vectors. We mention the textual data as the most popular raw data type in the real world and study the process of indexing a text into a list of words and encoding it into a numerical vector. We also study the image data that is one more kind of raw data and the process of encoding them into numerical vectors. Because it is assumed that a numerical vector is given as the input, in applying machine learning algorithms to real tasks, it is very important to encode the raw data into numerical vectors.
In Sect. 3.1, we introduce the process of encoding raw data into numerical vectors, and in Sect. 3.2, we cover the relational data and the process of encoding it into numerical vectors. In Sect. 3.3, we mention the textual data and describe the process of encoding it into numerical vectors. In Sect. 3.4, we cover the image data and its encoding process, and in Sect. 3.5, we make the summarization on this chapter and the further discussions. This chapter is intended to describe the three kinds of raw data and the process of encoding it into numerical vectors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
T. Connolly, C. Begg, Database Systems: A Practical Approach to Design Implementation, and Management (Addison Wesley, Essex, 2005)
T. Jo, Graph based KNN for optimizing index of news articles. J. Multimed. Inf. Syst. 3(3), 53–62 (2016)
Y. Yang, An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)
F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
T. Jo, Text Mining: Concepts and Big Data Challenge (Springer, Berlin, 2018)
T. Jo, Normalized table matching algorithm as approach to text categorization. Soft Comput. 19(4), 839–849 (2015)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jo, T. (2021). Data Encoding. In: Machine Learning Foundations. Springer, Cham. https://doi.org/10.1007/978-3-030-65900-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-65900-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65899-1
Online ISBN: 978-3-030-65900-4
eBook Packages: EngineeringEngineering (R0)