Abstract
In this chapter, we explore the topic of pipeline techniques in both Scikit-Learn and PySpark. By harnessing the power of pipelines, data scientists can automate and standardize the steps involved in the modeling workflow. This enables the building of robust and scalable models, enhances model interpretability, and facilitates the integration of additional preprocessing steps and feature engineering techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Testas, A. (2023). Pipelines with Scikit-Learn and PySpark. In: Distributed Machine Learning with PySpark. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9751-3_17
Download citation
DOI: https://doi.org/10.1007/978-1-4842-9751-3_17
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-9750-6
Online ISBN: 978-1-4842-9751-3
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)