Search…
⌃K
Links

BinaryEncoder

Compatibility: boolean data
The BinaryEncoder transforms True and False values into numerical values of 0 and 1.
from rdt.transformers.boolean import BinaryEncoder
be = BinaryEncoder()

Parameters

missing_value_replacement: Add this argument to replace missing values during the transform phase
(default) 'mode'
Replace all missing values with the most frequently occurring value
<number>
Replace all missing values with the specified number (0 or 1)
model_missing_values: Add this argument to create another column describing whether the values are missing
(default) False
Do not create a new column. During the reverse transform, missing values are added in again randomly.
True
Create a new column (if there are missing values). This allows you to keep track of the missing values so you can recreate them on the reverse transform.
Setting this value to True may add another column to your dataset. Adding extra columns uses more memory and increases the RDT processing time.

Examples

from transformers.boolean import BinaryEncoder
# replace all missing values with the most commonly occuring value
# and also record whether the values are missing
be = BinaryEncoder(missing_value_replacement='mode',
model_missing_values=True)
# replace all missing values with a new value
be = BinaryEncoder(missing_value_replacement=-1)

FAQs

The decision to replace missing values is based on how you plan to use your data. For example, you might be using RDT to clean your data for machine learning (ML). Check to see whether the ML techniques you plan to use allow missing values.
The method for replacing missing values is dependent on what they mean in your dataset. For example, if missing values are the equivalent of False, replace them with a 0.
When setting the model_missing_values parameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.