GaussianNormalizer
Last updated
Last updated
Compatibility: numerical
data
To use this transformer, you must install the copulas
module in addition to rdt
. This is available in open source for all users.
The GaussianNormalizer
performs a statistical transformation on numerical data. It approximates the shape of the overall column. Then, it converts the data to a different shape: a standard normal distribution (aka a bell curve with mean = 0 and standard deviation = 1).
missing_value_replacement
: Add this argument to replace missing values during the transform phase
(default) 'mean'
Replace all missing values with the average value.
'random'
Replace missing values with a random value. The value is chosen uniformly at random from the min/max range.
'mode'
Replace all missing values with the most frequently occurring value
<number>
Replace all missing values with the specified number (0
, -1
, 0.5
, etc.)
None
Deprecated. Do not replace missing values. The transformed data will continue to have missing values.
(deprecated) model_missing_values
: Use the missing_value_generation
parameter instead.
missing_value_generation
: Add this argument to determine how to recreate missing values during the reverse transform phase
(default) 'random'
Randomly assign missing values in roughly the same proportion as the original data.
'from_column'
Create a new column to store whether the value should be missing. Use it to recreate missing values. Note: Adding extra columns uses more memory and increases the RDT processing time.
None
Do not recreate missing values.
distribution
: In the first step of the normalization, the transformer approximates the shape (aka distribution) of the overall column after searching through multiple options. Use this parameter to limit the options it searches through.
(default) 'parametric'
Search through 1-dimensional distributions that have a set number of parameters. This includes: gaussian
, gamma
, beta
, student_t
and truncated_gaussian
<name>
Only consider the distribution that is named. Possible names include 'norm'
, 'gamma'
, 'beta'
, 't'
, 'truncnorm'
, 'uniform'
and 'gaussian_kde'
Deprecated: 'gaussian'
, 'truncated_gaussian'
and 'student_t'
. Instead, please use the names 'norm'
, 'truncnorm'
and 't'
(respectively).
<copulas.univariate.Univariate>
enforce_min_max_values
: Add this argument to allow the transformer to learn the min and max allowed values from the data.
(default) False
Do not learn any min or max values from the dataset. When reverse transforming the data, the values may be above or below what was originally present.
True
Learn the min and max values from the input data. When reverse transforming the data, any out-of-bounds values will be clipped to the min or max value.
learn_rounding_scheme
: Add this argument to allow the transformer to learn about rounded values in your dataset.
(default) False
Do not learn or enforce any rounding scheme. When reverse transforming the data, there may be many decimal places present.
True
Learn the rounding rules from the input data. When reverse transforming the data, round the number of digits to match the original.
Use the Univariate object created from the Copulas library. See the for more information.