Patrick's blog

Posted Mon 15 April 2019

Redshift COPY gotcha with epoch seconds

Redshift has a COPY statement that will load csv or json files from S3 into a table. I was using this feature the other day to load csv data from multiple different sources within S3 into a single table in Redshift.

This was part of an effort to streamline some of our analysts' workflows from multiple, bespoke, boto3-powered download scripts into a connection string.

It took a few iterations to determine the data types I needed for each column. But most of the datetime fields we write as epoch seconds which can safely be loaded into an INTEGER column.

I did the math. INTEGER supports up to 2147483647 which ends up being 2038-01-19T03:14:08+00:00. Plenty of time.

I knew that we would want to leverage the timestamp functions that Redshift's SQL dialect offered so I was looking for a way to convert those epoch seconds fields into TIMESTAMP and found this:

Automatic recognition does not support epochsecs and epochmillisecs.

Kind of a bummer but the workaround isn't so bad:

SELECT (TIMESTAMP 'epoch' + 1555734000 * INTERVAL '1 second') AS my_timestamp;

And since Redshift supports user-defined functions (UDFs), we can actually wrap this thing up into something that's less visually noisy when you're having to cast a lot of these columns in your ELT/ETL COPY script:

CREATE OR REPLACE FUNCTION f_epochsecs_to_timestamp (INTEGER)
    RETURNS TIMESTAMP
STABLE AS $$
    SELECT (TIMESTAMP 'epoch' + $1 * INTERVAL '1 second')
$$ LANGUAGE SQL;

Which turns our previous conversion into:

SELECT f_epochsecs_to_timestamp(1555734000) AS my_timestamp;

References

Category: testing
Tags: redshift sql