Skip to main content

Data Encoding

  • Chapter
  • First Online:
Machine Learning Foundations

Abstract

This chapter is concerned with the raw data types, which are in the status before encoding them into numerical vectors. The relational data that consists of records is the typical raw data type, and it is relatively easy to encode them into numerical vectors. We mention the textual data as the most popular raw data type in the real world and study the process of indexing a text into a list of words and encoding it into a numerical vector. We also study the image data that is one more kind of raw data and the process of encoding them into numerical vectors. Because it is assumed that a numerical vector is given as the input, in applying machine learning algorithms to real tasks, it is very important to encode the raw data into numerical vectors.

In Sect. 3.1, we introduce the process of encoding raw data into numerical vectors, and in Sect. 3.2, we cover the relational data and the process of encoding it into numerical vectors. In Sect. 3.3, we mention the textual data and describe the process of encoding it into numerical vectors. In Sect. 3.4, we cover the image data and its encoding process, and in Sect. 3.5, we make the summarization on this chapter and the further discussions. This chapter is intended to describe the three kinds of raw data and the process of encoding it into numerical vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Connolly, C. Begg, Database Systems: A Practical Approach to Design Implementation, and Management (Addison Wesley, Essex, 2005)

    Google Scholar 

  2. T. Jo, Graph based KNN for optimizing index of news articles. J. Multimed. Inf. Syst. 3(3), 53–62 (2016)

    Google Scholar 

  3. Y. Yang, An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)

    Article  Google Scholar 

  4. F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  5. T. Jo, Text Mining: Concepts and Big Data Challenge (Springer, Berlin, 2018)

    Google Scholar 

  6. T. Jo, Normalized table matching algorithm as approach to text categorization. Soft Comput. 19(4), 839–849 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jo, T. (2021). Data Encoding. In: Machine Learning Foundations. Springer, Cham. https://doi.org/10.1007/978-3-030-65900-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65900-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65899-1

  • Online ISBN: 978-3-030-65900-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics