This document summarizes key concepts and APIs in PySpark 3.0. It covers Spark fundamentals like RDDs, DataFrames and Datasets. It also covers PySpark modules for SQL, streaming, machine learning and graph processing. Finally it summarizes common DataFrame transformations and actions for manipulating data as well as Spark SQL functionality.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
599 views2 pages
PySpark Reference Guide
This document summarizes key concepts and APIs in PySpark 3.0. It covers Spark fundamentals like RDDs, DataFrames and Datasets. It also covers PySpark modules for SQL, streaming, machine learning and graph processing. Finally it summarizes common DataFrame transformations and actions for manipulating data as well as Spark SQL functionality.
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)