- Relational Databases
- SQL LISTAGG concatenates values of multiple rows into an delimited string such as CSV.
Feed: Planet PostgreSQL.
listagg function transforms values from a group of rows into a list of values that are delimited by a configurable separator.
Listagg is typically used to denormalize rows into a string of comma-separated values (CSV) or other comparable formats suitable for human reading.
Listagg does not apply any escaping: it is not generally possible to tell whether an occurrence of the separator in the result is an actual separator, or just part of a value. The safe use of
listagg for electronic data interfaces is therefore limited to cases in which an unambiguous separator can be selected, e.g. when aggregating numbers, dates, or strings that are known to not contain the separator.
When implementing electronic data interfaces, arrays and document types (JSON, XML) are advantageous as they offer type safety, or at least proper escaping.
Listagg is an ordered set function, which require the
within group clause to specify an order. The minimal syntax is:
, ) WITHIN GROUP(ORDER BY …)
must not contain window functions, aggregate functions or subqueries. The standard only allows character literals in
—i.e. no expression and no bind parameter. Bind parameters are nevertheless well supported in practice.
null values before aggregation like most other aggregate functions. If no
not null value remains, the result of
null. If needed,
coalesce can be used to replace
null values before aggregation.
on overflow clause
The return type of
listagg is either
clob with an implementation defined length limit. In practice, it is a
Listagg accepts the optional
on overflow clause to define the behavior if the result exceeds the length limit of the return type:
, ON OVERFLOW …)
The default is
on overflow error. In this case, the standard requires an exception with SQLSTATE 22001 to be raised—in practice, this requirement is not fulfilled.
on overflow truncate clause prevents the overflow by only concatenating as many values as the result type can accommodate. Furthermore, the
on overflow truncate clause allows one to specify how the result is terminated:
ON OVERFLOW TRUNCATE [
] WITH[OUT] COUNT
defaults to three periods (
...) and will be added as last element if truncation happens.
with count is specified and truncation happens, the number of omitted values is put in brackets and appended to the result.
The SQL standard does not require a warning to be issued on truncation. To know whether the result is complete or not, users can parse the result or compare the actual length of the result to the calculated length for a result containing all values.
listagg function accepts the optional set quantifiers
, …) …
If neither is specified,
all is default. If
distinct is specified, duplicate values are removed before aggregation. Note that the elimination of duplicates is subject to the collation in effect.
The standard does not specify which of the duplicate elements is removed.
order by clause places one occurrence at the beginning and the other at the end, it is unspecified at which place the value appears in the result.
Distinct can be implemented manually by removing duplicates before aggregation—e.g. in a subquery. This works for databases not supporting
listagg, and also allows to keep a particular occurrence if duplicates exist.
The following example demonstrates this approach. The columns
o represent the
group by and
order by keys respectively. The example uses
min(o) to keep the first occurrence in case one value appears multiple times.
SELECT g , LISTAGG(value, ',') WITHIN GROUP (ORDER BY o) list FROM (SELECT g, min(o) o, value FROM dist_listagg GROUP BY g, value ) dt GROUP BY g
LISTAGG(…) WITHIN GROUP(…) [FILTER(WHERE …)] [OVER(…)]
over clause must not contain an
order by clause because the mandatory
within group clause must contain an
order by clause anyway. It is not possible to narrow the window frame: the set of aggregated rows is always the full partition.
Listagg was introduced with SQL:2016 as optional feature T625. Even though
listagg is not yet widely supported, most databases offer similar functionality using a proprietary syntax.
If the query does not strictly require the return of a delimited string, arrays can be used to return an array of values. An array can be constructed using the
array_agg aggregate function or via a subquery.
ORDER BY …)
In the second form,
fetch first to remove duplicates and limit the array length.
Neither of the two approaches performs an implicit
cast: the array elements have the same type as
. That means that the retrieving application can fetch the values in a type-safe manner and apply formatting if required.
The type-safe nature of arrays allows them to also carry
null values in an unambiguous way.
Array_agg does therefore not remove
null values like other aggregate functions do (including
filter clause can be used to remove
null values before aggregation with
array_agg. If the
filter clause removes all rows,
null—not an empty array.
The subquery syntax allows removing
null values in the
where clause of the
and returns an empty array if the subquery doesn’t return any rows.
If the order of elements is irrelevant, multisets and
collect can also be used to pass a type-safe list to an application.
array_agg, the SQL standard defines aggregate functions that return JSON or XML fragments: i.e.
xmlagg. The main benefit compared to
listagg is that they apply the respective escape rules.
ORDER BY … [NULL ON NULL])
, ) ORDER BY …)
Some articles show how to use SQL string manipulation functions to transform such documents into a delimited string. These examples often neglect the fact that the serialized document might contain escape sequences that need to be unescaped (e.g.,
< in XML or
" in JSON).
The following special case can be implemented using only
with recursive and intermediate SQL-92:
…) WITHIN GROUP(ORDER BY )
distinct and that
has to be the exact same expression in both cases.
The following example uses
group by key,
', ' as
WITH RECURSIVE list_agg(g, val, list) AS ( SELECT g, min(val), CAST(null AS VARCHAR) FROM listagg_demo GROUP BY g UNION ALL SELECT prev.g , (SELECT min(val) FROM listagg_demo this WHERE this.g = prev.g AND this.val > prev.val ) val , COALESCE(list || ', ', '') || val FROM list_agg prev WHERE prev.val IS NOT NULL ) SELECT g, list FROM list_agg WHERE val IS NULL ORDER BY g
This particular implementation uses the “loose index scan” technique as explained on the PostgreSQL Wiki. The performance will remain at a rather low level even with an index on
(g, val). The
distinct behavior is a side effect of this technique.
The correct handling of
val is an important special case: although
null is generally ignored in aggregations, a group that consists of
null values only must still be present in the result. This means that
null must not be removed if there is no
not null value in the group. The implementation above uses
min(val) in the non-recursive expression to get this behavior.
A more generic implementation that supports
all semantics and arbitrary
order by clauses is possible using
with recursive and window functions. Aaron Bertrand’s post “Grouped Concatenation in SQL Server” presents an example of this approach.
In both cases, arbitrary
on overflow behavior can be implemented.
The only useful extension that is commonly available is the support of bind parameters and constant expressions in
The standard neither allows omitting the
nor omitting the
within group clause. Yet some databases treat them as optional and apply implementation defined defaults or expose undefined behavior if
within group is omitted.
There are two widely available proprietary alternatives to
string_agg. Even though some databases use the same proprietary function name, they still use a different syntax.
The good news is that the proprietary functions have the same default semantic as
listagg: they filter
null values before aggregation but don’t remove duplicates (
string_agg — PostgreSQL Syntax
, [ORDER BY …])
PostgreSQL also offers a proprietary function to turn arrays into delimited strings:
string_agg — SQL Server Syntax (vNext/14.0)
string_agg doesn’t support
distinct and uses the
within group clause to specify an order:
, ) [WITHIN GROUP (ORDER BY …)]
Aaron Bertrand’s post “Grouped Concatenation in SQL Server” demonstrates many other ways to implement
listagg in older SQL Server releases—including the
FOR XML PATH approach.
group_concat — MySQL Syntax
distinct and expects the
order by clause before the optional separator (default: comma):
ORDER BY … [SEPARATOR ] )
group_concat truncates the result if it exceeds a configurable maximum length and doesn’t honor element boundaries when doing so. In other words, it might truncate in the middle of an element.
group_concat — SQLite Syntax
group_concat neither supports