Skip to main content
Data harmonization transforms disparate source data into a standardized schema. Source systems often use different naming conventions, data types, and structures to represent the same type of data. For example, you might have the exact same survey hosted on different systems, each which produce different output data with the same type of information. To merge this data into a single dataset, you need to harmonize the data. The Harmonization Agent does this for you by transforming source columns to expected target columns while validating data quality. The Harmonization Agent runs in a dedicated project mode and requires the Transform Agent to be disabled. Projects can operate in either Transform Agent mode or Harmonization mode, but not both simultaneously.

How does the Harmonization Agent work?

The Harmonization Agent uses AI to infer mappings between source schemas and a target Common Data Model (CDM) that you define. While the agent can assume a mapping to get your data in the right format, we highly recommend providing a reference pipeline to “teach” the agent how to transform your data. When you provide a reference pipeline, the agent compares the source schema to patterns in reference pipelines to make informed transformation decisions. Once the agent has generated the harmonized pipeline, Prophecy encourages a human-in-the-loop approach. For each source-target mapping, you can review confidence scores and double check each transformation. Additionally, you can review the data quality test outcomes for each mapping. After review and validation, the harmonized pipeline becomes a production-ready asset that can be scheduled and deployed.

Common Data Model

A Common Data Model (CDM) defines the target schema that all source data must conform to. It specifies standardized column names, data types, and data tests used across downstream pipelines. The CDM is typically created once and reused across multiple harmonization workflows.

Reference pipelines

Optionally, you can build a reference pipeline that transforms source data and maps it to your CDM. The Harmonization Agent can use this pipeline to learn how similar source data should be transformed. For example, the agent can reference patterns such as column name equivalencies, common transformation types, and data type conversions. When harmonizing a new source with similar characteristics, the agent applies these patterns to generate mappings and transformations automatically.

Human-in-the-loop validation

The review process allows users to validate agent-generated mappings and transformations before finalization. During review, you can see how each source column was mapped to the target schema, the transformations applied, the agent’s confidence score for each mapping, and the data quality tests that were applied. The review step ensures that business logic and domain-specific requirements are correctly implemented. While the agent handles routine mappings efficiently, human oversight catches edge cases and ensures that transformations align with business rules.

Enabling harmonization

The Harmonization Agent runs in a dedicated project mode and requires the Transform Agent to be disabled. Projects can operate in either Transform Agent mode or Harmonization/Documentation mode, but not both simultaneously. To use the Harmonization Agent:
  1. Disable the v4 Agent for the team that owns the project by opening Metadata → Teams → Select team → Settings → Advanced.
  2. Click the Enable Transform Agent toggle.
  3. Return to your project.
Only one agent mode can be active per team at a time.