Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Converting MySQL Syntax and Functions into Redshift

Converting MySQL Syntax and Functions into Redshift
May 26, 2016

Because our cache utilizes Redshift clusters, we often receive questions on how to convert a particular query from MySQL syntax to Redshift syntax. While Google turns up a treasure trove of results for syncing a MySQL database into Redshift, it comes up a bit short for showing how to actually translate the syntax of specific queries.

In this post, we will cover some of the more common MySQL functions that differ in syntax from Redshift, as well as various rules and tricks to keep in mind!

General Differences Between MySQL And Redshift

Grouping

One of the most common pitfalls when converting MySQL syntax to Redshift involves the group by requirements. Redshift is more stringent, and requires that all non-aggregate functions in the select statement must be included in the group by clause. As an example, MySQL syntax allows for a query like so:

select
   [created_at:week],
   country,
   count(1)
from
   orders
group by
   1

This same query would give an error in Redshift, as it would require the country column to be included in the group by clause.

Using Variables

Unlike MySQL, Redshift does not allow you to define variables within a query. To compensate for this, Redshift includes Window Functions that allow you to iterate over your data in a similar manner to how a variable would be used in MySQL. Let’s take a look at a query that calculates the cumulative sum of a column.

-- MySQL: Add Cumulative Column
set @iterating_variable := 0;
select
   created_at
   , number_of_orders
   , (@iterating_variable := 
       @iterating_variable + number_of_orders)
from 
   orders
order by
   created_at

To accomplish the same task in Redshift, we can use the sum() window function:

-- Redshift: Add Cumulative Column
select
   created_at
   , number_of_orders
   , sum(number_of_orders) 
       over (order by id rows unbounded preceding)
from
   orders
order by
   Created_at

Now both of these queries display the number of orders by day as well as the running cumulative total of orders!

Subqueries

One downside to MySQL is its reliance upon subqueries. If you’re looking to use multiple subqueries/CTE’s in your main query, it can quickly get overwhelming. Redshift allows you to use With Clauses to build temporary tables that only exist within the query.

While MySQL also has temporary tables, they can only be called once within a query. Redshift’s temporary tables, created through these with clauses, can be referenced multiple times in the query!

Generating Series

Neither MySQL nor Redshift has a built-in function to generate a series of dates or values. However, they each have a couple of clever options to imitate this behavior. In fact, we already have a great blog post that details these methods in greater detail!

Calling Names with Spaces or Reserved Words

In terms of differences between the two SQL Types, this is actually one of the more straightforward cases. MySQL makes use of backticks to “wrap” these names, while Redshift double-quotes them. An example can be seen below where we call two tables: one with a space in its name, and one that is a reserved word.

-- MySQL
select *
from
   `Table One`,
    `Order`
-- Redshift
select *
from
   Table One,
   Order

Concatenating Strings

MySQL’s concat() function lets you pass in multiple strings to concatenate together. Redshift’s concat() function only allows you to pass in two strings, so you would have to nest this function in order to concatenate more than two values.

-- MySQL
select concat(`Periscope ’, ‘is ’, ‘great’)
-- Redshift
select concat(‘Periscope ’, concat(‘is ’, ‘great’))

Redshift also has a shortcut for concatenation, using double-pipe notation in place of a function call:

select Periscope || is || great

Date/Time Specific Functions

Now()

In some of the later versions of Redshift, now() is a deprecated function. You would want to use getdate() or sysdate() to return the current time based on the timezone of your database.

From_unixtime() and Unix_timestamp()

Redshift does not have a default function equivalent to from_unixtime() that converts unix timestamps into date timestamps. Instead, we can employ some clever math shown in this Valkrysa blog post:

select timestamp epoch + your_time_colum * interval 1 second

The first part of this select statement grabs the date and timestamp that acts as the threshold for unix time (1970-01-01 00:00:00). Since unix time measures the amount of seconds that have elapsed since this date, we just need to convert it into ‘second’ literals. Now, we have our timestamp of 1970-01-01 00:00:00, and we can add the total number of seconds to get the appropriate date.

Converting from a timestamp value into unix time is simpler, and actually has two functions to support this. You can do:

select extract(epoch from your_time_column)

or

select date_part(epoch, time_column)

In each of the two functions, Redshift is essentially calculating the number of seconds that have elapsed from 1970-01-01 00:00:00 and your timestamp column.

Interval Literals

In many ways, Redshift and MySQL are on the same page in terms of using interval literals with timestamps. The major difference surrounds the pickiness of syntax, as well as whether or not the Redshift instance is running on a lead node. In Redshift, best practice recommends single quoting your literal value. This would look similar to below:

select getdate() - interval 1 day

For queries that do not run on the lead node, Redshift does not allow for interval values higher than a week. So attempting to add or subtract a ‘month’ or ‘year’ interval value can throw an error if the timestamp column is evaluated across multiple nodes.

Date_sub() and Date_add()

These functions act similarly to the interval literals discussed above. Redshift combines both of these functions into a single dateadd() function. You’ll notice that Redshift’s version calls for three parameters instead of two.

In order to get the same behavior of MySQL’s date_sub(), you would want to pass in a negative interval to the dateadd() function. That means your queries may look something like this:

select dateadd(day, 1, getdate())  -- Returns Same Time Tomorrowselect
dateadd(day, -1, getdate()) -- Returns Same Time Yesterday

Date_format()

This is an important function in MySQL for getting timestamps to display in the exact way that you would like. Redshift has two similar functions depending on your use-case. to_char() takes a timestamp, and allows you to pass in a parameter that control the formatting. The to_date() function takes a string or numeric value, and uses the same format options as to_char() to control the end result.

Let’s take a look at some of the different formats we can return! Running the following query:

   select
      getdate() as format_one,
      to_char(getdate(), MON-DD-YYYY) as month_day_year,
      to_char(getdate(), Day, Month DD YYYY) as day_month_year,
      to_char(getdate(), YYYY-Q') as year_quarter

Returns the following values:

Datediff()

Redshift’s datediff() function is more robust and flexible than MySQL’s in terms of the level of specificity. MySQL’s datediff() is limited to returning a whole number value of days between two dates. Redshift’s datediff() allows you to choose what format to calculate the difference in (e.g. minutes, hours, days, weeks).

One other potential misstep to watch out for is the order of parameters. MySQL’s version of the function takes two parameters, where a positive value is returned if the first parameter is larger than the second. Redshift reverses the order of the parameters if you want to return a positive value. That means your queries would run like this:

-- MySQL: returns 1
select datediff(now(), now() - interval 1 day)
-- Redshift: returns 1
select datediff(day, getdate() - interval 1 day, getdate()) 

Day(), Week(), Month(), etc.

Redshift uses the extract() function to pull out the desired numeric value of your timestamps. Rather than having a specific function for each date interval, extract() allows you to pass in the exact interval you want to find.

-- MySQL
select day(now()), week(now()), month(now())
-- Redshift
select 
  extract(day from getdate())
  , extract(week from getdate())
  , extract(month from getdate())

Conclusion

While it’s not quite as similar as Postgres, you can see that MySQL does share many similarities with Redshift! For the majority of these functions, the key difference is something as small an additional parameter, the ordering of parameters, or even just a different name.

Again, these are just a few of the common translation questions we’ve received. Please feel free to contact us with any additional functions that you think would make a great addition to this post!

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?