Knowledge graph indexer

The knowledge graph indexer scans your SQL warehouse and external data storage to build and maintain the metadata index that powers AI features. Prophecy automatically indexes your data environment when you create a fabric using your default credentials. After the initial run, you can configure indexing behavior to control when and how the indexer runs. This page covers:

Scheduling automatic indexing
Triggering manual runs
Configuring separate authentication credentials for the indexer

How indexing works

The knowledge graph indexer crawls each connection in your fabric separately. For each run, the indexer:

Uses the credentials stored in the connection to authenticate as an identity in the external system.
Scans databases and file storage systems that the identity has access to, limited by the permissions granted to that identity.
Indexes table names, schemas, column names, data types, and other metadata.
Updates the knowledge graph with this information.

Configure automatic indexing

Configure scheduled crawling to keep your index up-to-date without manual intervention.

In Prophecy, navigate to Metadata > Fabrics.
Select the fabric where you will enable indexing.
Open the Connections tab.
Click to open the connection that you wish to schedule indexing for.
In the connection dialog, scroll to the Knowledge Graph Indexer tile and toggle on Enable Knowledge Graph Periodic Indexing.
Configure the schedule to run hourly, daily, or weekly.

The schedule must have a defined frequency and timezone. The default timezone is the timezone from where you access Prophecy.

Scheduling parameters

Hourly

Parameter	Description	Default
Repeat every … from	The interval in hours between pipeline runs, starting at a specific time. Example: Repeat every 2 hours from 12:00 AM.	Every 1 hour starting at 2:00 AM

Daily

Parameter	Description	Default
Repeat at	The time of day when the schedule will run. Example: Repeat at 9:00 AM	2:00 AM

Weekly

Parameter	Description	Default
Repeat on	The day(s) of the week that the pipeline will run. Example: Repeat on Monday, Wednesday, Friday	Sunday
Repeat at	The time of the day that the pipeline will run. Example: Repeat at 9:00 AM	2:00 AM

Manually trigger indexing

You may need to manually trigger indexing if you know that certain tables are missing from the knowledge graph. To do so:

In Prophecy, navigate to Metadata > Fabrics.
Select the fabric where you will enable indexing.
Open the Connections tab.
Click to open the connection that you wish to reindex.
Scroll to the Knowledge Graph Indexing Status tile in the connection dialog.
Click Start to reindex the tables and track its progress. You’ll be able to view the progress of processed schemas and directories.

If more convenient, you can also start this process from the Environment tab in your project:

Open a project in the project editor.
Attach to the fabric that you wish to reindex.
In the left sidebar, open the Environment tab.
Below your connections, you’ll see a Missing Tables? callout.
Click Refresh.

You might be prompted to manually trigger indexing if the Agent can’t locate a table during a conversation.

Add separate authentication for the indexer

In certain scenarios, you may want more granular control over which tables get indexed. For Databricks connections only, Prophecy lets you do so with dedicated authentication credentials for the knowledge graph indexer. There are two types of credentials stored in a connection:

Pipeline Development and Scheduled Execution credentials control how pipelines authenticate when they run.
Knowledge Graph Indexer credentials control how the crawler authenticates when it indexes your warehouse on an automated schedule.

If you don’t add separate authentication for the indexer, it will use the pipeline development credentials when running.

The knowledge graph indexer always uses the same identity as the pipeline development identity if the pipeline development authentication strategy is Personal Access Token (rather than OAuth). This section is not applicable if you use the PAT authentication method.

Prerequisites

Before configuring dedicated credentials for the knowledge graph indexer, you must:

Upgrade to Prophecy 4.2.2 or later.
Configure your SQL warehouse connection with a Databricks connection. Other SQL warehouses are not supported.
Be a Prophecy administrator. Though there are no role-based restrictions for configuring the knowledge graph indexer, you need to understand how authentication works in Prophecy.
Be a Databricks administrator. This lets you assign appropriate permissions to the identity that will be used to run the indexer. The identity must have MANAGE access on the assets that you wish to index in the knowledge graph.

The knowledge graph indexing permissions should be equal to or a superset of the pipeline execution permissions. This ensures that the same tables you use in your pipelines are indexed by the knowledge graph. However, Prophecy does not enforce this.

Procedure

To configure the knowledge graph indexer for a fabric:

In Prophecy, navigate to Metadata > Fabrics.
Select the fabric where you will enable indexing.
Open the Connections tab.
Click the pencil icon to edit the SQL Warehouse Connection.
In the dialog, scroll to the Knowledge Graph Indexer tile.
Configure authentication based on your pipeline development authentication method: If you use User OAuth for pipeline development:
- Choose either OAuth (User) or OAuth (Service Principal) for the knowledge graph indexer.
If you use Service Principal OAuth for pipeline development:
- You can only use Service Principal OAuth for the knowledge graph indexer.

Service Principal OAuth (recommended)

Recommended for production and scheduled indexing. Credentials don’t expire.

Configuration: Reuse pipeline development credentials or provide a different Service Principal Client ID and Client Secret.
What gets indexed: All tables that the service principal can access.

If you use User OAuth for pipeline development, Prophecy enforces user permissions even when the indexer uses service principal credentials. Users only see tables they have permission to access.

User OAuth

User OAuth should only be used for development.

Configuration: Uses the same app registration as pipeline development.
What gets indexed: All tables that the individual user can access.
Limitations: Requires frequent user logins. Scheduled crawling can fail when user credentials expire.

Getting started

AI

Development

Environment

Production

Collaboration

How indexing works

Configure automatic indexing

Scheduling parameters

Manually trigger indexing

Add separate authentication for the indexer

Prerequisites

Procedure

Service Principal OAuth (recommended)

User OAuth

Getting started

AI

Development

Environment

Production

Collaboration

​How indexing works

​Configure automatic indexing

​Scheduling parameters

​Manually trigger indexing

​Add separate authentication for the indexer

​Prerequisites

​Procedure

​Service Principal OAuth (recommended)

​User OAuth

How indexing works

Configure automatic indexing

Scheduling parameters

Manually trigger indexing

Add separate authentication for the indexer

Prerequisites

Procedure

Service Principal OAuth (recommended)

User OAuth