LogoLogo
GitHubSlackDataCebo
  • RDT: Reversible Data Transforms
  • Getting Started
    • Installation
    • Quickstart
  • Usage
    • Basic Concepts
    • HyperTransformer
      • Preparation
      • Configuration
      • Transformation
  • Transformers Glossary
    • Numerical
      • ClusterBasedNormalizer
      • FloatFormatter
      • GaussianNormalizer
      • LogScaler
      • LogitScaler
      • * OutlierEncoder
      • ❖ DPECDFNormalizer
      • ❖ DPLaplaceNoiser
      • ❖ ECDFNormalizer
      • ❖ XGaussianNormalizer
    • Categorical
      • LabelEncoder
      • OrderedLabelEncoder
      • FrequencyEncoder
      • OneHotEncoder
      • OrderedUniformEncoder
      • UniformEncoder
      • BinaryEncoder
      • ❖ DPDiscreteECDFNormalizer
      • ❖ DPResponseRandomizer
      • ❖ DPWeightedResponseRandomizer
    • Datetime
      • OptimizedTimestampEncoder
      • UnixTimestampEncoder
      • ❖ DPTimestampLaplaceNoiser
    • ID
      • AnonymizedFaker
      • IndexGenerator
      • RegexGenerator
      • Treat IDs as categorical labels
    • Generic PII Anonymization
      • AnonymizedFaker
      • PseudoAnonymizedFaker
    • * Deep Data Understanding
      • * Address
        • * RandomLocationGenerator
        • * RegionalAnonymizer
      • * Email
        • * DomainBasedAnonymizer
        • * DomainBasedMapper
        • * DomainExtractor
      • * GPS Coordinates
        • * RandomLocationGenerator
        • * GPSNoiser
        • * MetroAreaAnonymizer
      • * Phone Number
        • * AnonymizedGeoExtractor
        • * NewNumberMapper
        • * GeoExtractor
  • Resources
    • Use Cases
      • Contextual Anonymization
      • Differential Privacy
      • Statistical Preprocessing
    • For Businesses
    • For Developers
Powered by GitBook
On this page
  • Parameters
  • Examples
  • FAQs
  1. Transformers Glossary
  2. Datetime

UnixTimestampEncoder

PreviousOptimizedTimestampEncoderNext❖ DPTimestampLaplaceNoiser

Last updated 16 days ago

Compatibility: datetime data

The UnixTimestampEncoder transforms data that represents dates and times into numerical values using the Unix time (aka Epoch time). The transformed value is the number of nanoseconds that have passed since Jan 1, 1970 00:00:00.000000 UTC.

from rdt.transformers.datetime import UnixTimestampEncoder
transformer = UnixTimestampEncoder()

Parameters

missing_value_replacement: Add this argument to replace missing values during the transform phase

(default) 'mean'

Replace all missing values with the average value.

'random'

Replace missing values with a random value. The value is chosen uniformly at random from the min/max range.

'mode'

Replace all missing values with the most frequently occurring value

<number>

Replace all missing values with the specified number (0, -1, 0.5, etc.)

None

Do not replace missing values. The transformed data will continue to have missing values.

(deprecated) model_missing_values: Use the missing_value_generation parameter instead.

missing_value_generation: Add this argument to determine how to recreate missing values during the reverse transform phase

(default) 'random'

Randomly assign missing values in roughly the same proportion as the original data.

'from_column'

Create a new column to store whether the value should be missing. Use it to recreate missing values. Note: Adding extra columns uses more memory and increases the RDT processing time.

None

Do not recreate missing values.

enforce_min_max_values: Add this argument to allow the transformer to learn the min and max allowed values from the data.

(default) False

Do not learn any min or max values from the dataset. When reverse transforming the data, the values may be above or below what was originally present.

True

Learn the min and max values from the input data. When reverse transforming the data, any out-of-bounds values will be clipped to the min or max value.

datetime_format: Add this argument to tell the transformer how to read your datetime column if it's in a specific format that isn't easy to identify.

(default) None

Format detection isn't needed. This may be because your data is represented by pd.datetime objects. If your data is present as a string, please provide a format.

<string>

Examples

from transformers.datetime import UnixTimestampEncoder

transformer = UnixTimestampEncoder(missing_value_replacement='mean',
                                   datetime_format='%b %d, %Y %I:%M:%S %p')

FAQs

When do I need to supply a format string?

The transformer should be able to automatically detect the most common datetime formats. If you are not sure whether your format can be detected, we recommend trying it without the format string first. If you see an error, supply the format.

Particular confusion might arise if your datetime values have uncommon formats. For example:

  • You do not have leading 0's in your months or dates, such as "1/1/21" instead of "01/01/21"

  • You are using something other that hyphens, dashes or colons to separate out the date & time components. Such as "[Jan][1][2021][12:34]".

Should I replace missing values?

The decision to replace missing values is based on how you plan to use your data. For example, you might be using RDT to clean your data for machine learning (ML). Check to see whether the ML techniques you plan to use allow missing values.

When is it necessary to model missing values?

When setting the missing_value_generation parameter, consider whether the "missingness" of the data is something important. For example, maybe the user opted out of supplying the info on purpose, or maybe a missing value is highly correlated with another column your dataset. If "missingness" is something you want to account for, you should model missing values.

Read the format according to instructions in the <string>. For eg. to represent a datetime like "Feb 15, 2022 10:23:45 AM", you can use the format string: "%b %d, %Y %I:%M:%S %p". For more info, see .

Python's strftime module↗