SQL
Prophecy privileges SQL as the primary language for code generation for data analysis. Prophecy supports a variety of SQL warehouses to execute pipeline-generated SQL queries. For most use cases, use SQL as project language.Simplified PySpark
Private PreviewAvailable for Enterprise Edition only.
Requirements
To use the Simplified PySpark project type, you need:- A Databricks Spark cluster defined in your fabric. Simplified PySpark does not work with Databricks serverless or any other Spark provider, such as Amazon EMR.
- A Databricks SQL warehouse configured in your fabric. This is if you want to switch your project to SQL later.
-
An init script in Databricks that installs required libraries on your cluster. The library installs functionality that is usually executed by Prophecy Automate.
How to create a PySpark project
The steps to create a PySpark project are similar to creating a SQL project. The only difference is the project type.- Click on the Create Entity button in the left navigation bar.
- Hover over the Project tile and select Create.
- Give your project a name.
- Under Team, select your personal team. (It will match your individual user email.)
- Under Select Template, choose Custom.
- For the Project Type, choose Spark/Python (PySpark) > Simplified.
- Click Continue.
- Under Connect Git Account, connect to an external Git provider or select Prophecy-managed Git.
- Click Continue.
Comparison matrix
While we’re working to achieve full parity between languages, there are some differences between SQL and PySpark. The following table compares the feature availability between the two languages.| Feature | Comparison |
|---|---|
| Gems | Most transformation gems work identically in SQL and PySpark modes. Some gems gain additional capabilities in PySpark. For example, SQL can only write to one output at a time, while PySpark can write to multiple outputs. Some gems may not have PySpark implementations. If you switch to PySpark and use a gem without a Python implementation, Prophecy displays an error indicating that the gem doesn’t support PySpark. |
| Orchestration | PySpark pipelines can still be orchestrated using Prophecy Automate scheduling. However, you can also schedule using Databricks jobs. If you switch to SQL from PySpark, your project will not be compatible with Databricks jobs. |
| Tests | SQL projects support data tests. PySpark projects support data tests and unit tests. |

