BinaryEncoder
Compatibility:
boolean
dataThe
BinaryEncoder
transforms True
and False
values into numerical values of 0
and 1
.
from rdt.transformers.boolean import BinaryEncoder
be = BinaryEncoder()
missing_value_replacement
: Add this argument to replace missing values during the transform phase(default) 'mode' | Replace all missing values with the most frequently occurring value |
<number> | Replace all missing values with the specified number ( 0 or 1 ) |
model_missing_values
: Add this argument to create another column describing whether the values are missing(default) False | Do not create a new column. During the reverse transform, missing values are added in again randomly. |
True | Create a new column (if there are missing values). This allows you to keep track of the missing values so you can recreate them on the reverse transform. |
Setting this value to
True
may add another column to your dataset. Adding extra columns uses more memory and increases the RDT processing time.from transformers.boolean import BinaryEncoder
# replace all missing values with the most commonly occuring value
# and also record whether the values are missing
be = BinaryEncoder(missing_value_replacement='mode',
model_missing_values=True)
# replace all missing values with a new value
be = BinaryEncoder(missing_value_replacement=-1)
When setting the
model_missing_values
parameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.Last modified 2mo ago