Databricks serverless compute for PySpark

Databricks serverless compute allows you to run workloads without manually provisioning a Spark cluster. With serverless compute, Databricks takes care of the infrastructure in the background, so your jobs start up quickly and scale as needed. Prophecy supports serverless compute for interactively running pipelines in PySpark projects on Databricks. This page explains how to use serverless compute with Prophecy, including supported data sources, data sampling modes, and current limitations.

Databricks serverless compute differs from serverless SQL warehouses. Prophecy uses serverless compute to run Spark pipelines on Spark fabrics. In contrast, serverless SQL warehouses are connected to Prophecy via JDBC and are used to run SQL queries generated from pipelines in SQL projects.

Prerequisites

To use serverless compute in Prophecy, you need:

Access to serverless compute in Databricks
PySpark projects in Prophecy (Scala not supported)

Supported data sources

You can run the following sources on Databricks serverless compute:

Supported data sampling modes

You can use the following data sampling modes when using Databricks serverless compute:

Selective mode
Vanilla mode (deprecated)

Limitations

Below are the current limitations of Databricks Serverless and how they impact Prophecy project development.

Feature	Limitation
Scala support	Databricks serverless only supports Python and SQL. Scala projects cannot run on Databricks Serverless.
Dependencies	Only Python dependencies are supported. Dependencies must be added through the Prophecy UI. You cannot install dependencies to serverless compute directly in Databricks.
Jobs	Scheduled pipeline runs cannot run on Databricks serverless. Prophecy only supports running pipelines on-demand on serverless.
Authentication	To use serverless compute, you must authenticate with Databricks using a Databricks personal access token. Prophecy does not support OAuth for serverless compute.
Row size	Maximum row size is 128MB.
Driver size	Databricks serverless driver size is unknown and cannot be changed.
Supported data formats	XLSX, fixed format, and custom formats are not supported.
UDF network access	UDFs cannot access the internet.
Spark configuration	Databricks Serverless only supports a limited number of Spark configuration properties.
APIs in Script gems	Spark Connect APIs are supported. Spark RDD APIs are not supported. DataFrame and SQL cache APIs are not supported.

For the complete list of limitations, visit Serverless compute limitations in the Databricks documentation.

Getting started

Development

Production

Execution

Extensibility

Databricks serverless compute for PySpark

Prerequisites

Supported data sources

Supported data sampling modes

Limitations

Getting started

Development

Production

Execution

Extensibility

​Prerequisites

​Supported data sources

​Supported data sampling modes

​Limitations

Prerequisites

Supported data sources

Supported data sampling modes

Limitations