FloatFormattertransforms numerical data. By default, it does nothing because numerical data is already ready to use for data science. But it can optionally handle missing values, learn rounding schemes and min/max bounds.
from rdt.transformers.numerical import FloatFormatter
transformer = FloatFormatter()
missing_value_replacement: Add this argument to replace missing values during the transform phase
model_missing_values: Use the
missing_value_generation: Add this argument to determine how to recreate missing values during the reverse transform phase
enforce_min_max_values: Add this argument to allow the transformer to learn the min and max allowed values from the data.
learn_rounding_scheme: Add this argument to allow the transformer to learn about rounded values in your dataset.
computer_representation: Add this argument when the original data has a specific representation, even if it's not loaded that way into Python. The transformer will make sure that any reverse transformed data is compatible with this representation.
from transformers.numerical import FloatFormatter
ff = FloatFormatter(missing_value_replacement='mean',
On the forward transform, this transformer uses
missing_value_generationstrategy. In this case, we create an extra column storing that the value is missing.
On the reverse transform,
learn_rounding_schemeare applied. In this case, the values are rounded to 2 decimal digits like the original data. Also, missing values are added back in.
The method for replacing missing values is dependent on what they mean in your dataset. For example:
- If missing values are the equivalent of
0, replace them with a
- If missing values indicate that you don't know the value at all, you might replace them with the
When setting the
model_missing_valuesparameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.