OptimizedTimestampEncoder
Compatibility: datetime
data
The OptimizedTimestampEncoder
transforms data that represents dates and times into numerical values. The transformed value is a number that represents the datetime. It is optimized to take up the least memory based on your unique dataset, and can only be understood by the transformer.

from rdt.transformers.datetime import OptimizedTimestampEncoder
transformer = OptimizedTimestampEncoder()
Parameters
missing_value_replacement
: Add this argument to replace missing values during the transform phase
(default) 'mean'
Replace all missing values with the average value.
'random'
Replace missing values with a random value. The value is chosen uniformly at random from the min/max range.
'mode'
Replace all missing values with the most frequently occurring value
None
Do not replace missing values. The transformed data will continue to have missing values.
(deprecated) model_missing_values
: Use the missing_value_generation
parameter instead.
missing_value_generation
: Add this argument to determine how to recreate missing values during the reverse transform phase
(default) 'random'
Randomly assign missing values in roughly the same proportion as the original data.
'from_column'
Create a new column to store whether the value should be missing. Use it to recreate missing values. Note: Adding extra columns uses more memory and increases the RDT processing time.
None
Do not recreate missing values.
enforce_min_max_values
: Add this argument to allow the transformer to learn the min and max allowed values from the data.
(default) False
Do not learn any min or max values from the dataset. When reverse transforming the data, the values may be above or below what was originally present.
True
Learn the min and max values from the input data. When reverse transforming the data, any out-of-bounds values will be clipped to the min or max value.
datetime_format
: Add this argument to tell the transformer how to read your datetime column if it's present as a string
(default) None
Format detection isn't needed. This may be because your data is represented by pd.datetime
objects. If your data is present as a string, please provide a format.
<string>
Read the format according to instructions in the <string>
. For eg. to represent a datetime like "Feb 15, 2022 10:23:45 AM"
, you can use the format string: "%b %d, %Y %I:%M:%S %p"
.
For more info, see Python's strftime module↗.
* extract_timezone
: Add this argument if your datetime column has timezone information, and you'd like to extract the timezone into a new column to consider as a separate feature.
(default) False
Do not extract the timezone. Your datetime values will be converted to numerical values based on their timezones, but the timezones themselves will not be extracted into a new column.
True
Extract the timezones into a new column. Your datetime values will be converted into numerical values based on their timezones, and the timezone values themveles will be extracted into a new column.
Examples
Basic case: This transformer is able to parse your datetime format and convert each value into a numerical, Unix time.
from transformers.datetime import OptimizedTimestampEncoder
transformer = OptimizedTimestampEncoder(missing_value_replacement='mean',
datetime_format='%d %b %Y')

Converting based on timezones: If your data contains timezones, the transformer will consider your timezone during the conversion. For example 3pm in New York is the same as 8pm in London. Both of these datetime values will be converted to the same Unix time.
from transformers.datetime import OptimizedTimestampEncoder
transformer = OptimizedTimestampEncoder(datetime_format='%b %d, %Y %I:%M%p (%z)')

Extracting timezone values: In addition to the Unix time conversation, SDV Enterprise users will be able to extract the timezones into a new, categorical column that you can consider as a separate feature. This is particularly useful if your data contains multiple timezones. When reverse transforming back to the original data, this will allow you to preserve the same mix of timezones.
from transformers.datetime import OptimizedTimestampEncoder
transformer = OptimizedTimestampEncoder(
datetime_format='%b %d, %Y %I:%M:%S %p (%z)',
extract_timezone=True)

For more information about timezones, see the FAQ.
FAQs
Last updated