Batch and Real-Time Data Ingestion and Processing

Quinto, Butch

doi:10.1007/978-1-4842-3147-0_7

Butch Quinto²

2920 Accesses

Abstract

Data ingestion is the process of transferring, loading, and processing data into a data management or storage platform. This chapter discusses various tools and methods on how to ingest data into Kudu in batch and real time. I’ll cover native tools that come with popular Hadoop distributions. I’ll show examples on how to use Spark to ingest data to Kudu using the Data Source API, as well as the Kudu client APIs in Java, Python, and C++. There is a group of next-generation commercial data ingestion tools that provide native Kudu support. Internet of Things (IoT) is also a hot topic. I’ll discuss all of them in detail in this chapter starting with StreamSets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Plumpton, Victoria, Australia
Butch Quinto

Authors

Butch Quinto
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Quinto, B. (2018). Batch and Real-Time Data Ingestion and Processing. In: Next-Generation Big Data. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3147-0_7

Download citation

DOI: https://doi.org/10.1007/978-1-4842-3147-0_7
Published: 13 June 2018
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3146-3
Online ISBN: 978-1-4842-3147-0
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics