This gem runs in .
Overview
Use the Filter gem to keep rows that match a condition and remove all other rows from the pipeline. Common use cases include:- filtering active customers
- removing null values
- keeping recent records
- filtering by date ranges
- selecting high-value transactions
Parameters
| Parameter | Description | Required |
|---|---|---|
| Model | Input dataset to filter. | True |
| Filter Condition | Boolean expression used to determine which rows are kept. | True |
Common filter examples
| Goal | Filter condition |
|---|---|
| Keep active customers | Status = 'ACTIVE' |
| Keep high-value orders | OrderAmount > 1000 |
| Filter recent orders | OrderDate >= '2025-01-01' |
| Remove null values | CustomerId IS NOT NULL |
| Keep specific countries | Country IN ('US', 'CA') |
Example
Assume you have the following weather prediction table.| DatePrediction | TemperatureCelsius | HumidityPercent | WindSpeed | Condition |
|---|---|---|---|---|
| 2025-03-01 | 15 | 65 | 10 | Sunny |
| 2025-03-02 | 17 | 70 | 12 | Cloudy |
| 2025-03-03 | 16 | 68 | 11 | Rainy |
| 2025-03-04 | 14 | 72 | 9 | Sunny |
| DatePrediction | TemperatureCelsius | HumidityPercent | WindSpeed | Condition |
|---|---|---|---|---|
| 2025-03-03 | 16 | 68 | 11 | Rainy |
| 2025-03-04 | 14 | 72 | 9 | Sunny |
Using pipeline parameters in filter conditions
You can reference pipeline parameters in filter conditions to make filtering dynamic at runtime. In Visual mode, select Configuration Variables from the expression builder to insert a parameter directly. In Code mode, use Jinja syntax:| Parameter type | Example filter condition |
|---|---|
| String | sensor_id = {{ var('sensor') }} |
| Date | from_utc_timestamp(timestamp_col, 'UTC') > {{ var('start_date') }} |
| Numeric (Int, Long, Float, Double) | total_usage_mb > {{ var('usage_cap_mb') }} |
| Array | Use array_contains in the visual expression builder. See Use parameters. |
| Boolean | archived = {{ var('include_archived') }} |
Common issues
No rows returned
Verify that:- the filter condition matches the column data type
- the values exist in the dataset
- string comparisons use the expected capitalization
Null values not matching
Comparisons withNULL may return unexpected results.
Use:
IS NULLIS NOT NULL
= NULL!= NULL
Date comparisons not working
Ensure date values use the correct format and data type.Similar tools and concepts
The Filter gem works similarly to a SQLWHERE clause.
For example, this filter condition:
- the Alteryx Filter tool
df.filter()in PySpark- boolean indexing in Pandas
Filter gem vs Conditional gem
The Filter gem and the Conditional gem both evaluate conditions on a dataset, but they serve different purposes.Key differences
| Filter gem | Conditional gem | |
|---|---|---|
| Purpose | Reduce data | Route data |
| Outputs | One output | Two or more outputs |
| Behavior | Keeps rows that match the condition and removes the rest | Sends rows to different outputs based on conditions |
| Routing | No routing | Yes |
| Order matters | No | Yes (first matching output wins) |
Row-level vs dataset-level behavior
- The Filter gem always operates at the row level.
- The Conditional gem can operate at:
- row level (for example,
OrderAmount > 1000) - dataset level (for example,
Count < threshold)
- row level (for example,
When to use the Filter gem
Use the Filter gem when you want to:- keep only matching rows
- remove unwanted records
- reduce the size of a dataset
- apply filtering logic similar to a SQL
WHEREclause
When to use the Conditional gem
Use the Conditional gem when you want to:- route rows to different outputs
- create branching logic
- split data into multiple paths
- rows matching
OrderAmount > 1000can be routed toout0 - all remaining rows can be routed to
out1
Summary
- Filter gem → keeps matching rows
- Conditional gem → routes rows to different outputs

