Abstract
Nowadays, data is one of the main assets of any company. As a result, each team of analysts is faced with the need to organize data science processes. Snowflake is a smart choice as a data source for storing structured and semistructured data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
You can find more information about the Alteryx platform at https://www.alteryx.com/.
- 2.
You can find more information about Apache Spark at https://spark.apache.org/.
- 3.
You can find more information about the Databricks platform at https://databricks.com/.
- 4.
You can find more information about DataRobot at https://www.datarobot.com/.
- 5.
You can find more information about the H2O platform at https://www.h2o.ai/.
- 6.
You can find more information about R Studio at https://www.rstudio.com/.
- 7.
You can find more information about Qubola at https://www.qubole.com/.
- 8.
Pandas is a Python library providing data structures and data analysis methods. For more information, see https://pandas.pydata.org.
- 9.
Scikit-learn is a free Python machine learning library. For more information, see https://scikit-learn.org.
- 10.
TensorFlow is an open source deep learning library. For more information, see https://www.tensorflow.org.
- 11.
MLflow is an open source platform for the machine learning lifecycle. For more information, see https://mlflow.org/.
- 12.
Apache Airflow is a schedule and monitor workflows tool. For more information, see https://airflow.apache.org.
- 13.
AWS Elastic MapReduce (EMR) is a Hadoop managed service on AWS. For more information, see https://aws.amazon.com/emr/.
- 14.
HDInsight is a Hadoop–managed service on Azure. For more information, see https://azure.microsoft.com/en-us/services/hdinsight/.
- 15.
Google Cloud Dataproc is a Hadoop–managed service on GCP. For more information, see https://cloud.google.com/dataproc/.
- 16.
Spark provides data frames and data sets. For more information, see https://spark.apache.org/docs/latest/sql-programming-guide.html.
- 17.
For more information about Snowflake Connector for Spark, see https://docs.snowflake.net/manuals/user-guide/spark-connector.html.
- 18.
For more information about column mapping, see https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#label-spark-options.
- 19.
Maven is a build automation tool used primarily for Java projects. For more information, see https://maven.apache.org/.
- 20.
For more information, see https://databricks.com/product/unified-analytics-platform.
- 21.
For more information about Delta Lake, see https://delta.io/.
- 22.
For more information about Apache Hadoop Distributed File System, see https://hadoop.apache.org/.
- 23.
For more information about MLFlow, see https://mlflow.org/.
- 24.
For more information about Microsoft Azure, see https://azure.microsoft.com.
- 25.
For more information about limits, see https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits.
- 26.
For more information about Azure Databricks, see https://azure.microsoft.com/en-us/pricing/details/databricks/.
- 27.
For more about optimizing performance with caching, see https://docs.databricks.com/delta/delta-cache.html.
- 28.
For more about Databricks secrets, see https://docs.databricks.com/user-guide/secrets/index.html.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Dmitry Anoshin, Dmitry Shirokov, Donna Strok
About this chapter
Cite this chapter
Anoshin, D., Shirokov, D., Strok, D. (2020). Snowflake and Data Science. In: Jumpstart Snowflake. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-5328-1_12
Download citation
DOI: https://doi.org/10.1007/978-1-4842-5328-1_12
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-5327-4
Online ISBN: 978-1-4842-5328-1
eBook Packages: Professional and Applied ComputingProfessional and Applied Computing (R0)Apress Access Books