Skip to main content
This gem runs in .
You can use the Imputation gem to replace a specified value in one or more numeric fields with a replacement value before downstream analysis or modeling. For example, you can replace null values in sales fields with the average of non-null sales so those missing values do not distort later calculations.

Prerequisites

Add prophecy_basics package version 1.0.11 or higher to your project.

Parameters

ParameterDescription
Fields to imputeSelect one or more numeric fields to update.
Incoming value to replaceChoose which value should be replaced in the selected fields.
  • Null
  • User-specified value
If you choose User-specified value, the Value to Replace field appears.
Value to ReplaceEnter the value to replace when Incoming value to replace is set to User-specified value.
Replace with valueChoose the value used for replacement.
  • Average
  • Median
  • Mode
  • User-specified value
If you choose User-specified value, the Replacement value field appears.
Replacement valueEnter the replacement value when Replace with value is set to User-specified value.
Include imputed value indicator fieldAdd an indicator field for each imputed column that shows whether a value was imputed.
Output imputed values as a separate fieldKeep the original field unchanged and write the imputed result to a new column. (Non-imputed values are automatically included in column.)

How it works

The Imputation gem scans the selected fields and looks for values that match the configured Incoming value to replace setting. Matching values are replaced using the selected method:
  • Average: Replaces with the mean of valid values in the field, excluding the value being replaced.
  • Median: Replaces with the middle value in the field, excluding the value being replaced.
  • Mode: Replaces with the most frequently occurring value in the field, excluding the value being replaced.
  • User-specified value: Replaces with the value you provide.

Output

By default, the output contains the original data stream with imputed values written back into the selected fields. When Include imputed value indicator field is enabled, an additional field is added for each imputed field to indicate whether the value was imputed. Naming pattern: <original_field>_Indicator When Output imputed values as a separate field is enabled, the original field is preserved and a new field is added with the imputed result. Naming pattern: <original_field>_ImputedValue If both options are selected, both additional fields are included.

Notes

  • This gem works for numeric fields.
  • Imputation is calculated separately for each selected field.
  • When using Average, Median, or Mode, the replacement statistic is calculated using valid values only, excluding the value being replaced.
  • If you do not select Output imputed values as a separate field, the original field is overwritten with the imputed result.

Example

Suppose you have the following dataset with missing values in numeric fields.
ProductPriceTotal_Sale
Shirt20.0200.0
Pantsnull150.0
Jacket50.0null
Shoes30.0300.0
If you configure the gem to replace Null values using Average, the null for “Jacket” is replaced with the average of Total_Sale.

Result

ProductPriceTotal_Sale
Shirt20.0200.0
Pants33.3150.0
Jacket50.0216.7
Shoes30.0300.0