What are data tests? - Prophecy Documentation

There are two types of data tests in Prophecy: table tests and project tests. Table tests and project tests serve different purposes in your data quality strategy. Understanding their differences helps you select the right approach for each validation scenario. The bottom line is that you can use either type of test to validate the same data quality requirements. However, table tests require parameterized SQL queries, whereas project tests leverage visual pipeline building.

When to use table tests

Use table tests when you need reusable, parameterized tests that can be applied across multiple tables or models. Table tests are written as “test definitions,” which are essentially stored SQL queries with parameters. They are useful for:

Parameterization: Leverage parameters (like model and column_name).
Sharing: Share tests with other teams through Prophecy packages.
Standardization: Establish consistent data quality checks across your project.

The test logic must be expressible as a parameterized SQL query. Because of this, you need to know how to write SQL queries.

When to use project tests

Use project tests when your validation logic doesn’t need to be reused elsewhere. Project tests provide the following benefits:

No SQL required: Prepare data for testing using the visual pipeline builder.
Workflow-specific: Validate specific combinations of models or tables that are unique to a particular pipeline or workflow.

However, project tests are not parameterized, so you cannot reuse the same test with different inputs.

Example: Comparing row counts between two tables

Both table tests and project tests can validate the same data quality requirement. This example demonstrates how to verify that two tables have the same number of rows using each approach.

Scenario

You need to ensure that customers and customers_cleaned tables have matching row counts. If the counts differ, the test should fail.

Approach 1: Table test

Create a parameterized test definition that you can apply to any table.

Create a new test definition.
Define the parameters {{ model }} and {{ compare_model }} to accept any two tables.

Define the SQL query for the test. The query returns rows only when the counts differ (test fails).

 with a as (
     select
     count(*) as count_a
     from {{ model }}
 ),
 b as (
     select
     count(*) as count_b
     from {{ compare_model }}
 ),
 final as (
     select
         count_a,
         count_b,
         abs(count_a - count_b) as diff_count
     from a
     cross join b
 )
 select * from final
 where diff_count > 0

Add the test to a table in a pipeline.
- Prophecy sets the value of the model parameter to the current table.
- You specify the compare_model parameter to the table you want to compare.
Run the test.

Result: You can reuse this test definition for any table in any pipeline in your project.

Approach 2: Project test

Build a visual pipeline that performs the same validation using gems instead of SQL.

Add two Table gems to the pipeline to read customers and customers_cleaned.
Add two Aggregate gems to count rows in each table.
Add a Join gem to combine the counts into a single row.
Add a Reformat gem to calculate the difference between counts.
Add a Filter gem to return rows only when the difference is greater than 0.
Connect to the Data Test gem to evaluate the result: if any rows exist, the test fails.

Pipeline structure:

Table 1 → Aggregate (count rows) ┐
                                    → Join → Reformat → Filter → Data Test
Table 2 → Aggregate (count rows) ┘

Project test pipeline showing visual gems for comparing row counts

Result: This test validates the specific customers and customers_cleaned tables. To test different tables, you must create a new project test.

​When to use table tests

​When to use project tests

​Example: Comparing row counts between two tables

​Scenario

​Approach 1: Table test

​Approach 2: Project test