BinaryEncoder
Compatibility:
boolean
dataThe
BinaryEncoder
transforms True
and False
values into numerical values of 0
and 1
.
from rdt.transformers.boolean import BinaryEncoder
transformer = BinaryEncoder()
missing_value_replacement
: Add this argument to replace missing values during the transform phase(default) 'mean' | Replace all missing values with the average value. |
'mode' | Replace all missing values with the most frequently occurring value |
<number> | Replace all missing values with the specified number ( 0 , -1 , 0.5 , etc.) |
None | Do not replace missing values. The transformed data will continue to have missing values. |
(deprecated)
model_missing_values
: Use the missing_value_generation
parameter instead.missing_value_generation
: Add this argument to determine how to recreate missing values during the reverse transform phase(default) 'random' | Randomly assign missing values in roughly the same proportion as the original data. |
'from_column' | Create a new column to store whether the value should be missing. Use it to recreate missing values. Note: Adding extra columns uses more memory and increases the RDT processing time. |
None | Do not recreate missing values. |
from rdt.transformers.boolean import BinaryEncoder
transformer = BinaryEncoder(missing_value_replacement='mode',
missing_value_generation='from_column')
When setting the
model_missing_values
parameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.Last modified 6mo ago