Abstract
In this chapter, an overview of how to use HDInsight for the purpose of machine learning will be presented. HDInsight is based on Apache Spark and used for in-memory cluster processing. Processing data in-memory is much faster than disk-based computing. Spark also supports the Scala language, which supports distributed data sets. Creating a cluster in Spark is very fast, and it is able to use Jupyter Notebook, which makes data processing and visualization easier. Spark clusters can also be integrated with Azure Event Hub and Kafka. Moreover, it is possible to set up Azure Machine Learning (ML) services to run distributed R computations. In the next section, the process of setting up Spark in HDInsight will be discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Leila Etaati
About this chapter
Cite this chapter
Etaati, L. (2019). Machine Learning on HDInsight. In: Machine Learning with Microsoft Technologies. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3658-1_15
Download citation
DOI: https://doi.org/10.1007/978-1-4842-3658-1_15
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3657-4
Online ISBN: 978-1-4842-3658-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)