Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Concatenating Rows in Redshift, Postgres, & MySQL

Concatenating Rows in Redshift, Postgres, & MySQL
November 12, 2015

Sometimes it’s helpful to look at an aggregated overview of many rows. With numeric columns it’s easy to sum or average many values, but for string columns we need something different. We can concatenate strings from multiple rows with concatenating aggregations.

Imagine we have a table of salespeople and the customers they manage:

We can compact this and make it more human-readable by flattening the many rows per salesperson into a single row:

This can be done with any concatenating aggregation. There’s group_concat in MySQL, string_agg in Postgres, and listagg in redshift.

Let’s first look at how to use each of these functions. Then, we’ll cover a few tricks with concatenating aggregations. ​ 

Redshift listagg

Redshift recently added listagg, and a corresponding window function version. The basic usage is:

select 
  salesperson,
  listagg(customer, ', '
    within group (order by contract_size desc)
from customers
group by 1

The within group clause is an uncommon bit of syntax, but all it does is allow for ordering the rows used in an aggregation. Here, it orders all of the customers by contract size, then listagg concatenates their names with a comma separator.

Postgres string_agg

select 
  salesperson,
  string_agg(customer , ', ' order by contract_size desc)
from customers
group by 1

Postgres has a robust set of aggregation functions, and you can similarly aggregate multiple rows with array_agg and json_agg.

By combining array_to_string with array_agg, you can duplicate the behavior of string_agg.

MySQL group_concat

select 
  salesperson,
  group_concat(customer 
               order by contract_size desc
               separator ',')
from customers
group by 1

MySQL supports an optional distinct operator in group_concat:

select group_concat(distinct customer)
from customers

MySQL has a relatively low cap of 1024 characters of the length of a group_concat result, but it can be increased by changing the group_concat_max_len system variable.

SQL Server workarounds

There’s no concatenating aggregation in SQL Server, but there are workarounds based on XML path.

Compound Concatenation

While the customer names are nice, it would be better if we could include contract size in our collapsed table:

For the remaining examples, we’ll use Redshift’s dialect.

Because these concatenating aggregations concatenate expressions, and not just column names, we can easily accomplish this by pre-concatenating the customer name and contract size:

select 
  salesperson,
  listagg(customer || ' $' || contract_size, ', '
    within group (order by contract_size desc)
from customers
group by 1

Concatenating a limited number of values

In a real database, concatenating all the rows for a given group will often create a string of unwieldy length. If we use a subquery to limit the number of rows in a given group first, then our final concatenation will be more readable:

with numbered_customers as (
  select 
    *
    row_number() over (partition by salesperson 
                        order by contract_size desc)
  from customers
)
select 
  salesperson,
  listagg(customer || ' $' || contract_size, ', '
    within group (order by contract_size desc)
from numbered_customers
where row_number <= 3
group by 1

We first declare numbered_customers, which numbers every customer per salesperson, starting with the largest contracts. Then we concatenate only the first three rows for each salesperson. Nice and tidy.

Happy Concatenating!

We hope you learned a few tricks for formatting SQL tables for human reading.

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?