Skip to main content
Prophecy supports two types of projects for data analysis. Hop to each section to learn more about each project type. The project type corresponds to the language Prophecy compiles your visual project components into.

SQL

Prophecy privileges SQL as the primary language for code generation for data analysis. Prophecy supports a variety of SQL warehouses to execute pipeline-generated SQL queries. For most use cases, use SQL as project language.

Simplified PySpark

Private Preview
/images/icon.png
Available for Enterprise Edition only.
Simplified PySpark is a project type that abstracts PySpark into a simple, data analysis interface. Use PySpark when your organization uses PySpark as the production codebase language. This makes the collaboration between data analysts and data engineers easier, as they can both use the same language. When you use PySpark, Prophecy generates Python files instead of SQL files. The visual interface remains unchanged—you continue working with SQL expressions and the same gem configurations.

Requirements

To use the Simplified PySpark project type, you need:
  • A Databricks Spark cluster defined in your fabric. Simplified PySpark does not work with Databricks serverless or any other Spark provider, such as Amazon EMR.
  • A Databricks SQL warehouse configured in your fabric. This is if you want to switch your project to SQL later.
  • An init script in Databricks that installs required libraries on your cluster. The library installs functionality that is usually executed by Prophecy Automate.
    #!/bin/bash
    
    set -eu
    
    echo 'GLIBC_TUNABLES=glibc.rtld.optional_static_tls=16384' > /etc/environment
    
    WHEEL="/Workspace/Shared/prophecy_automate/artifacts/prophecy_automate-2.0.0-py3-none-manylinux_2_17_x86_64.whl"
    
    python3 -m pip install --no-cache-dir "$WHEEL"
    
Once you create a Simplified PySpark project, you need to manually start the cluster in Databricks to run pipelines. There is currently no way to start a cluster or select a different cluster in Simplified PySpark projects.

How to create a PySpark project

The steps to create a PySpark project are similar to creating a SQL project. The only difference is the project type.
  1. Click on the Create Entity button in the left navigation bar.
  2. Hover over the Project tile and select Create.
  3. Give your project a name.
  4. Under Team, select your personal team. (It will match your individual user email.)
  5. Under Select Template, choose Custom.
  6. For the Project Type, choose Spark/Python (PySpark) > Simplified.
  7. Click Continue.
  8. Under Connect Git Account, connect to an external Git provider or select Prophecy-managed Git.
  9. Click Continue.
To save your project creation configuration, create a project creation template that you can reuse for future projects.

Comparison matrix

While we’re working to achieve full parity between languages, there are some differences between SQL and PySpark. The following table compares the feature availability between the two languages.
FeatureComparison
GemsMost transformation gems work identically in SQL and PySpark modes. Some gems gain additional capabilities in PySpark. For example, SQL can only write to one output at a time, while PySpark can write to multiple outputs. Some gems may not have PySpark implementations. If you switch to PySpark and use a gem without a Python implementation, Prophecy displays an error indicating that the gem doesn’t support PySpark.
OrchestrationPySpark pipelines can still be orchestrated using Prophecy Automate scheduling. However, you can also schedule using Databricks jobs. If you switch to SQL from PySpark, your project will not be compatible with Databricks jobs.
TestsSQL projects support data tests. PySpark projects support data tests and unit tests.

Switch between SQL and PySpark

To switch between SQL and PySpark, edit the backend language in the project’s development settings. Note that not every feature or configuration translates perfectly between languages. Always save or commit your work before switching so you can easily revert if something doesn’t translate.