Skip to main content
Estimated time: 30 minutes Build a data pipeline using Prophecy Agent. This tutorial walks you through exploring data, creating visualizations, and building transformations using natural language prompts. Follow the steps below to build a patient analytics pipeline.

Prerequisites

To complete this quickstart, you need a Prophecy fabric that uses Prophecy In Memory or Databricks as the compute engine.
Prophecy automatically creates a compatible fabric for Free and Professional Edition users. If you do not have any existing fabrics, you’ll need to create one.

Set up a new project

First, you need to create the project where you will build your pipeline. You’ll also need to add data to the project for this quickstart.
1

Create a project

  1. Click on the Create Entity button in the left navigation bar.
  2. Hover over the Project tile and click Create.
  3. Give your project a Name, such as Prophecy_Quickstart.
  4. Under Team, select your personal team. (It will match your user email.)
  5. Under Select Template, choose Prophecy for Analysts.
  6. Click Complete.
Prophecy will open the your new project in the Studio.
2

Connect to the execution environment

  1. Select the default fabric or your own fabric.
  2. Click Save.
This attaches the project to the fabric. You can switch fabrics at any time by clicking the fabric selector in the top right corner of the project canvas.
3

Add a new pipeline

  1. On the project landing page, click Create Pipeline.
  2. For the Pipeline Name, enter patient_analytics.
  3. Leave the default Directory Path of pipelines. Prophecy saves your compiled pipeline code in this folder of the project repository.
  4. Click Create.
This opens the pipeline canvas for your new pipeline.

Add data to your project

Next, you’ll add some data to the project so the Agent can find and transform it.
1

Create the seed data file

Load some data into the project as a :
  1. Open the Source/Target gem category.
  2. Click Table. This adds a new Table gem to the canvas.
  3. Hover over the gem and click Open.
  4. Select + New Table.
  5. For the Type and Format, choose Seed.
  6. Name the seed patients_raw_data.
  7. For the Seed path, choose seeds. Prophecy saves your seed file in this folder of the project repository.
  8. Click Next.
  9. In the Properties tab, paste the following data.
    patients_raw_data.csv
    patient_id,first_name,last_name,city,state,county,age,admission_date,diagnosis,treatment_cost
    1001,John,Smith,Boston,MA,Suffolk,45,2024-01-15,Hypertension,1250.00
    1002,Sarah,Johnson,Boston,MA,Suffolk,32,2024-01-18,Diabetes,2100.50
    1003,Michael,Williams,Cambridge,MA,Middlesex,58,2024-01-20,Heart Disease,3500.75
    1004,Emily,Brown,Boston,MA,Suffolk,29,2024-01-22,Asthma,850.25
    1005,David,Jones,Worcester,MA,Worcester,67,2024-01-25,Hypertension,1450.00
    1006,Jessica,Garcia,Boston,MA,Suffolk,41,2024-02-01,Diabetes,2200.00
    1007,Christopher,Miller,Springfield,MA,Hampden,53,2024-02-05,Heart Disease,3800.50
    1008,Amanda,Davis,Boston,MA,Suffolk,35,2024-02-08,Asthma,920.75
    1009,James,Rodriguez,Cambridge,MA,Middlesex,62,2024-02-10,Hypertension,1320.00
    1010,Lisa,Martinez,Worcester,MA,Worcester,48,2024-02-12,Diabetes,2150.25
    
  10. Click Next.
  11. Click Load Data to preview the data in tabular format.
  12. Click Save.
2

Run the seed

To materialize the seed data into the SQL warehouse, run the pipeline once. To do so, click the play button in the bottom right corner of the canvas.
3

Reindex your connection

Prophecy should automatically index the table when you save the seed. This allows the Agent to discover and use the seed data. If you have trouble finding the table in later steps, you can also manually reindex the knowledge graph.
  1. Open the Environment tab in the left sidebar.
  2. Below your connections, you’ll see a Missing Tables? callout.
  3. Click Refresh to trigger the knowledge graph indexer.
You’ll see a progress bar in the callout indicating the indexer is running. Once it’s complete, the callout will disappear.
Verify the table was indexed by checking the Environment tab in the left sidebar.

Explore the data

Now that you have data in your project, you can explore it using the Agent.
1

Access the Agent

Locate and open the Chat tab in the left sidebar.This is where you’ll interact with the Agent.
2

Explore your data

Ask the Agent to search for the seed data table.Enter the following prompt in the chat:
Find datasets with information pertaining to hospital patients
The Agent returns:
  • A short list of relevant datasets with descriptions.
  • A full list of matching datasets.
  • The option to add datasets directly to your pipeline on hover.
  1. Click on patients_raw_data (the data you uploaded) in the chat to open a detailed preview dialog where you can:
    • View the table location.
    • Examine the schema and column structure.
    • Preview sample data.
    • Review data profiles.
    • Open an Explore session for dataset-specific queries.
  2. Close the preview dialog to return to the chat.
If the Agent doesn’t find your table, verify that the table was indexed by checking the Environment tab in the left sidebar. Additionally, if you have other patient data in your warehouse, the Agent may return other datasets that match your query instead.
3

Get data samples

Request specific data samples to validate your understanding.Enter the following prompt, replacing @patients_raw_data with your actual table path:
Provide sample data from @patients_raw_data showing only patients from Boston
The Agent returns:
  • A table with the requested data.
  • An option to preview the table for detailed examination.
  • An option to add the table to your pipeline.
  • SQL execution logs showing the query used.
Verify the results show 4 patients: John Smith, Sarah Johnson, Emily Brown, and Jessica Garcia, all from Boston.
4

Create visualizations

Generate charts and insights from your data. Enter the following prompt:
Visualize number of patients per city in @patients_raw_data
The Agent returns:
  • An embedded chart in the chat showing patient counts by city.
  • An option to preview the table for detailed examination.
  • An option to add the table to your pipeline.
  • SQL execution logs showing the query used.
Click Preview to access:
  • Visualization tab: View larger charts and download charts as images.
  • Data tab: Examine underlying data and download data as JSON/Excel/CSV.
The chart should show Boston with 5 patients, Cambridge with 2, Worcester with 2, and Springfield with 1.

Build your pipeline

Now, you’ll build your pipeline transformations using the Agent. Keep the Visual view open (rather than the Code view) to see updates in real-time as the Agent adds or modifies gems on the canvas.
1

Build your first transformation

Describe the transformation you want to perform. Enter the following prompt:
Transform @patient_records to show total number of patients per county
Replace @patient_records with the actual gem label from your canvas if it differs.The Agent will:
  • Add the appropriate gem(s) to your pipeline canvas. This prompt should produce an Aggregate gem.
  • Execute the pipeline, generating data samples that you can review.
  • Provide a description of the changes made.
  • Show options to inspect, preview, or restore changes.
  • Display SQL execution logs.
The output should show 3 counties: Suffolk with 5 patients, Middlesex with 2, Worcester with 2, and Hampden with 1.
Each change that the Agent makes can be viewed in the project version history. You can revert the changes at any time.
2

Inspect pipeline changes

To understand what the Agent built:
  1. Click Inspect on the transformation response.
  2. Review the configuration panel starting with the first modified gem (highlighted in yellow).
  3. Use the Previous and Next buttons to navigate through modified gems.
  4. Examine input and output data to verify the transformation produces expected results.
This helps you:
  • Understand the Agent’s approach.
  • Verify that the transformation logic matches your expectations.
3

Add another transformation

Build a more complex transformation. Enter the following prompt:
Calculate average treatment cost per county from the previous step
The Agent adds another transformation that calculates the average cost. This demonstrates how you can chain transformations together.
Verify the results show average costs for each county. Suffolk should have an average around 1,544, Middlesex around 2,410, Worcester around 1,800, and Hampden 3,800.50.
4

Save your results

After building your pipeline, ask the Agent to save the final output. Enter the following prompt:
Save the final output of this pipeline as a table
The Agent adds a Table gem to the end of your pipeline. When you run the pipeline, the Table gem writes the data to your default database and schema defined in your fabric, allowing you to persist results.

Explore further

Try these additional tasks to extend your pipeline.
1

Filter data

Enter the following prompt in the chat:
Filter the source data to only include patients over 50 years old
2

Join with additional data

Create a second Seed file named county_info with the following content:
county_info.csv
county,population,region
Suffolk,800000,Eastern
Middlesex,1600000,Eastern
Worcester,830000,Central
Hampden,470000,Western
Then, prompt the Agent:
Join @l0_raw_patients with @county_info on county
The Agent will infer join keys, but you can also specify them explicitly.
3

Calculate derived metrics

Enter the following prompt in the chat:
Add a column that calculates days since admission using the current date

Connect your own data

This quickstart uses a Seed file as the source data. When you start building your own pipelines, you’ll likely want to use your own data from files or external systems. You can do this by:
  • Uploading files directly from your local filesystem using the upload file feature.
  • Ingesting data from external systems using connections.

Sample prompts reference

Use these prompts as templates for your own pipelines.
TaskPrompt Example
Find dataFind datasets containing customer information
Sample dataShow me 5 random records from @sales_data
Filter dataFilter to only include orders from 2024
Transform dataCalculate total revenue as quantity * price
Join dataJoin the orders and customers tables
Parse dataExtract the fields from json_data as columns — This works best if you provide a sample JSON object.
Clean dataRemove rows where email is null
Aggregate dataGroup by region and calculate average sales
Visualize dataCreate a bar chart of monthly sales
Save resultsSave the final output as a table

Tips

How to do it: Instead of Clean the data → Try Remove duplicate customer records and fill null values in the email columnWhy it helps: Reduces ambiguity so the Agent applies the correct operations.
How to do it: Break complex transformations into smaller requests. For example: Aggregate orders by customerJoin with customersFilter for orders made in 2024Why it helps: Improves reliability and makes it easier to debug or adjust each step.
How to do it: After a response, click Inspect and use Previous/Next to review highlighted gem configuration and outputWhy it helps: Ensures the transformation matches expectations before you continue.
How to do it: Use chat to scaffold transformations, then switch to the Visual canvas to fine-tune or add gemsWhy it helps: Combines speed (AI) with precision and control (visual editor).

Troubleshooting

Probable cause: Knowledge graph is out of dateHow to fix: Reindex your Databricks connection so the knowledge graph includes the table
Probable cause: Chat context isn’t relevant anymoreHow to fix: Click … > Reset in the chat interface to start a fresh session
Probable cause: You want to provide feedback to improve resultsHow to fix: Click the thumbs-up or thumbs-down on the Agent’s message. This helps us improve the Agent’s behavior.
Probable cause: Seed data wasn’t created correctlyHow to fix: Verify the table exists and contains the expected 10 rows of data