Search…
⌃K
Links

UnixTimestampEncoder

Compatibility: datetime data
The UnixTimestampEncoder transforms data that represents dates and times into numerical values using the Unix time (aka Epoch time). The transformed value is the number of nanoseconds that have passed since Jan 1, 1970 00:00:00.000000 UTC.
from rdt.transformers.datetime import UnixTimestampEncoder
ute = UnixTimestampEncoder()

Parameters

datetime_format: Add this argument to tell the transformer how to read your datetime column if it's in a specific format that isn't easy to identify.
(default) None
Automatically detect the format. The transformer is able to detect common format such as "02/15/22", "15/02/22 22:30", "02-15-2022 10:30PM" etc.
<string>
Read the format according to instructions in the <string>. For eg. to represent a datetime like "Feb 15, 2022 10:23:45 AM", you can use the format string: "%b %d, %Y %I:%M:%S %p". For more info, see Python's strftime module↗.
missing_value_replacement: Add this argument to replace missing values during the transform phase
(default) 'mean'
Replace all missing values with the average value.
'mode'
Replace all missing values with the most frequently occurring value
model_missing_values: Add this argument to create another column describing whether the values are missing
(default) False
Do not create a new column. During the reverse transform, missing values are added in again randomly.
True
Create a new column (if there are missing values). This allows you to keep track of the missing values so you can recreate them on the reverse transform.
Setting this value to True may add another column to your dataset. Adding extra columns uses more memory and increases the RDT processing time.

Examples

from transformers.datetime import UnixTimestampEncoder
ute = UnixTimestampEncoder(missing_value_replacement='mean',
datetime_format='%b %d, %Y %I:%M:%S %p')

FAQs

The transformer should be able to automatically detect the most common datetime formats. If you are not sure whether your format can be detected, we recommend trying it without the format string first. If you see an error, supply the format.
Particular confusion might arise if your datetime values have uncommon formats. For example:
  • You do not have leading 0's in your months or dates, such as "1/1/21" instead of "01/01/21"
  • You are using something other that hyphens, dashes or colons to separate out the date & time components. Such as "[Jan][1][2021][12:34]".
The decision to replace missing values is based on how you plan to use your data. For example, you might be using RDT to clean your data for machine learning (ML). Check to see whether the ML techniques you plan to use allow missing values.
When setting the model_missing_values parameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.