Quickstart

Follow this quickstart on the Enterprise Edition only.

Follow this 15 minute quickstart to learn how to create a Spark pipeline with Prophecy’s visual interface.

Objectives

In this quickstart, you will:

Create a new project and attach a fabric.
Develop a pipeline with a source and a transformation.
Run the pipeline interactively.
Save your changes.

Prerequisites

To complete this quickstart, you need:

A configured fabric (your execution environment).

If you need a fabric, you have a few options:

Create a fabric that connects to Prophecy-managed Databricks. Free trial users will automatically have this fabric.
Ask a team admin to create a fabric for you that connects to an existing external Spark engine.

Create a project

Click on the Create Entity button in the left navigation bar.
Hover over the Project tile and select Create.
Give your project a name.
Under Team, select your personal team. (It will match your individual user email.)
Under Select Template, choose Custom.
For the Project Type, choose Spark/Python (PySpark).
Click Continue.
Under Connect Git Account, select Prophecy Managed Git Credentials.
Click Continue.
Take a brief look at the default project packages, and then click Complete.

Your project is ready! Click Create Pipeline to open your first pipeline configuration.

Build a pipeline

Let’s set up the pipeline.

Ensure the pipeline links to the correct project.
Create a new development branch called devQS. Your pipeline will not appear in the main branch until you merge your changes.
Name your pipeline weather.
Leave the default Batch processing mode.
Click Create New.

Prophecy will open the pipeline canvas for your new pipeline in the project editor. Before moving on, you must attach to a Spark cluster.

In the project header, click Attach a cluster.
Choose an appropriate fabric to connect you to the Spark environment.
Select an existing cluster, or create a new one. New clusters may take a few minutes to start up.

If you have trouble attaching a cluster, you might not have the right permissions to access or create a cluster in your external Spark environment.

Add a source

For this quickstart, you’ll create a Seed as the data source.

Open the Source/Target gem category.
Click Source. This adds a new Source gem to the canvas.
Hover over the gem and click Open.

Fill in the gem configuration.

Select + New Dataset.
Name the dataset weather_forecast.
In the Type & Format tab, select the Seed type.
In the Data tab, paste the following data provided in CSV format. Then, click Next.

DatePrediction,TemperatureCelsius,HumidityPercent,WindSpeed,Condition
2025-03-01,15,65,10,Sunny
2025-03-02,17,70,12,Cloudy
2025-03-03,16,68,11,Rainy
2025-03-04,14,72,9,Sunny
2025-03-05,18,60,13,Windy
2025-03-06,19,58,14,Cloudy
2025-03-07,21,55,16,Rainy
2025-03-08,20,57,15,Sunny
2025-03-09,22,50,18,Windy
2025-03-10,23,48,20,Cloudy

Now, let’s make sure the data is properly loaded. Because the copy-pasted data is standard CSV format, the preconfigured properties in the Properties tab are correct. Because of this, Prophecy will know how to infer the schema of the data.

Click the Infer Schema button.
Review the inferred schema. Depending on your Spark engine, you might see different inferred types.
If the DatePrediction column is assigned a string type, change the type to date.
Enable the Enforce specified or inferred schema checkbox to enforce this change downstream.
Optional: Click on the Copilot icon to generate metadata descriptions of each column.
Click Next.
Click Load Data to preview the data in tabular format.
Click Create Dataset to save your seed as a dataset.

Add a reformat transformation

Now, you’ll configure your first data transformation using the Reformat gem.

From the Transform gem category, add a Reformat gem to your canvas.
Drag the Reformat gem near your Table gem to auto-connect them.
Open the Reformat gem configuration.
Notice that the first input port in0 displays your table and its schema.
Hover over your table name, and click Add 5 columns.

When you add columns to the Target Columns of a gem, Prophecy includes the columns in the output of the gem. Currently, this configuration would have the gem return the same table that was passed in. Let’s apply a few transformations.

Change the WindSpeed target column name to WindSpeedKMH. This renames the column.
Add a new target column called TemperatureFahrenheit.
Next to the new target column, write the expression (((TemperatureCelsius * 9.0D) / 5.0D) + 32) to convert the temperature into Fahrenheit. If your column name is descriptive, Copilot will write an expression for you.
After configuring the expression, click Save.

By default, gem expressions expect Spark SQL code.

Generate data previews

At this point, you may be curious to know what your data looks like. Generate data previews with the following steps:

Click the play button in the bottom right corner of the canvas.
As the pipeline runs, preview icons should appear as gem outputs.
Click on the Reformat output to preview the data in the Data Explorer.

Save the pipeline

In a real-world situation, your pipeline would be much more complex. Typically, pipelines require multiple transformation steps and send data to external outputs. For the purposes of this tutorial, we will save the pipeline as-is.

In the project footer, click Commit Changes. This opens the Git workflow dialog.
Review the commit history on the left side. You should only see the initial project commit at this point.
Review the entities changed in this commit on the right side. You should see the new weather pipeline and weather_forecast dataset.
Verify or update the Copilot-generated commit message that describes these changes.
Click Commit.

Now, you’ll be able to view this commit in the branch history. Feel free to continue in the Git workflow diagram to merge these changes into the main branch.

What’s next

Continue your Prophecy learning journey:

Discover the different Spark gems that you can use for data transformation
Reach out to us if you need additional help or guidance

Getting started

Development

Production

Execution

Extensibility

Objectives

Prerequisites

Create a project

Build a pipeline

Add a source

Add a reformat transformation

Generate data previews

Save the pipeline

What’s next

Getting started

Development

Production

Execution

Extensibility

​Objectives

​Prerequisites

​Create a project

​Build a pipeline

​Add a source

​Add a reformat transformation

​Generate data previews

​Save the pipeline

​What’s next

Objectives

Prerequisites

Create a project

Build a pipeline

Add a source

Add a reformat transformation

Generate data previews

Save the pipeline

What’s next