Abstract
Data ingestion is the process of transferring, loading, and processing data into a data management or storage platform. This chapter discusses various tools and methods on how to ingest data into Kudu in batch and real time. I’ll cover native tools that come with popular Hadoop distributions. I’ll show examples on how to use Spark to ingest data to Kudu using the Data Source API, as well as the Kudu client APIs in Java, Python, and C++. There is a group of next-generation commercial data ingestion tools that provide native Kudu support. Internet of Things (IoT) is also a hot topic. I’ll discuss all of them in detail in this chapter starting with StreamSets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Butch Quinto
About this chapter
Cite this chapter
Quinto, B. (2018). Batch and Real-Time Data Ingestion and Processing. In: Next-Generation Big Data. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3147-0_7
Download citation
DOI: https://doi.org/10.1007/978-1-4842-3147-0_7
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3146-3
Online ISBN: 978-1-4842-3147-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)