Feed: Planet MySQL.
Author: Guilhem Bichot.
Here is the second in a series of posts about CTEs, a new feature of MySQL 8.0, available in this Labs release. The first post ended with:
Inside the recursive CTE definition (the part in AS (…)), some syntax constraints must be respected […]
 a recursive SELECT mustn’t contain GROUP BY, aggregate functions
(like SUM), ORDER BY, LIMIT, DISTINCT (this rule doesn’t apply to the nonrecursive/anchor/seed SELECT)  a recursive SELECT must reference the CTE only once and only in its
FROM clause, not in any subquery. Of course, it can additionally reference other tables than the CTE and join them with the CTE, which can be very useful to build hierarchies (for example, if we have a table of bosses and employees and want to answer the question “who reports directly or indirectly to Mrs. X?”). If used in a JOIN like this, the CTE must not be on the right side of a LEFT JOIN.
So it is time to give some justifications for these constraints, which come from the SQL standard (except the exclusion of ORDER BY, LIMIT and DISTINCT, which is MySQLspecific, though also present in several wellknown RDBMSs). Consider this example, producing numbers from 1 to 10:

WITH RECURSIVE my_cte AS
SELECT 1 AS n
UNION ALL
SELECT 1+n FROM my_cte WHERE n<10 # < recursive SELECT
SELECT * FROM my_cte;

The recursive SELECT looks like it’s operating on the set of rows of my_cte, as it selects from my_cte. But in fact, it must act as if it operated on each row, one by one, in turn, in isolation from any other row of my_cte. You have to view the recursive SELECT as a process which, from one row of iteration N, without knowledge of other rows of iteration N, produces some new rows for iteration N+1, and repeats the job for another row of iteration N, until it has processed all rows of iteration N one by one, after which it processes one row of iteration N+1, and so on.
With that rule in mind, let’s realize that:
 GROUP BY needs all rows to put them into groups,
 So does an aggregate function like SUM,
 ORDER BY needs all rows to sort them,
 LIMIT needs to eliminate rows based on the count of other rows,
 DISTINCT needs to eliminate rows based on the existence of other rows,
 Referencing my_cte twice in the FROM clause needs a row from my_cte and another row from my_cte, to join them,
 Referencing my_cte in the FROM clause and in a subquery needs a row from my_cte for FROM and other rows of my_cte to evaluate the subquery,
 Referencing my_cte in the right side of a LEFT JOIN needs all rows of my_cte, to know if at least one joins with the left side or if a NULL complement must be made.
Given that the recursive SELECT has access to only one row at a time, it is clear why all cases above cannot be allowed in a recursive SELECT.
This SQL rule has two advantages: first, it gives database systems the freedom to generate rows in any order of steps; second, it forbids quite a few unreasonable queries.
To see how this influences the patterns of queries, let’s now build a recursive query to generate Fibonacci numbers, which are 1, then 1, then 1+1 (=2), then 1+2 (=3), then 2+3 (=5), then 3+5 (=8) – the (N+2)th number is the sum of the (N+1)th and the (N)th.
A first approach would be: put the two first numbers, 1 and 1, in a seed SELECT:

SELECT 1 as f # first number; “f” stands for Fibonacci number
SELECT 1 # second number

then generate all other numbers with

SELECT cte_n.f + cte_n_plus_1.f
FROM my_cte AS cte_n JOIN my_cte AS cte_n_plus_1
ON ... which condition to put here?

My intention is: grab the (N)th and (N+1)th numbers, which are already in my_cte, and sum them; so: grab a row from my_cte, containing the (N)th number, and a row from my_cte, containing the (N+1)th number, and sum them; to grab the two rows at the same time I use a SQL JOIN, but I wonder how to make sure the JOIN pairs the right numbers. I need to add, as ON condition, something like “the row of cte_n_plus_1 must be the number right after the row of cte_n”. Otherwise I’ll also be summing the 5th and the 9th numbers, or whatever! So I need to add ordinal numbers to the table. Ok, so let’s add ordinal numbers to the seed SELECT:

SELECT 1 as f, 1 as n UNION ALL SELECT 1, 2
# “f” : Fibonacci number; “n”: ordinal number,
# first row: has value 1 and is 1st number
# second row: has value 1 and is 2nd number

and now we can write this recursive SELECT:

SELECT cte_n.f + cte_n_plus_1.f
FROM my_cte AS cte_n JOIN my_cte AS cte_n_plus_1
ON cte_n.n + 1 = cte_n_plus_1.n # Pair (N)th with (N+1)th to add them

Alas, this won’t work, as it’s referencing my_cte twice in the FROM clause, which is forbidden as it violates the rule which says: “handle one row at a time”. Thus, I have to change my query so that all necessary information to calculate a new number is found in one row and doesn’t require two rows; for that, I’ll just store the Nth and (N+1)th number in one row:

SELECT 1 as f, 1 as next_f
# “f” : Fibonacci number of this row,
# “next_f”: next Fibonacci number after “f”,
# no ordinal number needed anymore

and use this recursive SELECT:

SELECT next_f, f+next_f FROM my_cte

Putting it all together, and adding a condition to stop around 500:

WITH RECURSIVE my_cte AS
SELECT 1 as f, 1 as next_f
UNION ALL
SELECT next_f, f+next_f FROM my_cte WHERE f < 500
SELECT * FROM my_cte;
+———+————+
 f  next_f 
+———+————+
 1  1 
 1  2 
 2  3 
 3  5 
 5  8 
 8  13 
 13  21 
 21  34 
 34  55 
 55  89 
 89  144 
 144  233 
 233  377 
 377  610 
 610  987 
+———+————+

That’s it; the f column contains the desired Fibonacci sequence.
While we saw above that a recursive CTE can’t be joined with itself, it can perfectly be joined with other tables. Let’s use that to produce all 4digit bit strings:

WITH RECURSIVE
SELECT ‘0’ AS d UNION ALL SELECT ‘1’
strings AS
SELECT ” AS s
UNION ALL
SELECT CONCAT(strings.s, digits.d)
FROM strings, digits
WHERE LENGTH(strings.s) < 4
SELECT s FROM strings WHERE LENGTH(s)=4;

A first, nonrecursive CTE, digits, holds the two building blocks 0 and 1. Then the recursive CTE, strings, starts with an empty string, then concatenates the result with the 0 and 1 of digits, and so on, so the expected result is: empty string, 0, 1, 00, 01, 10, 11, 000, etc, up to 1111.
The real result is ERROR 1406 (22001): Data too long for column ‘s’ at row 2.
At this point you may want to read towards the end of the previous blog post (search for “longer and longer”) which mentions that the type of column strings.s is determined from the nonrecursive SELECT only i.e. SELECT ” AS s : and the type of the empty string (”) is CHAR(0)… Inserting ‘0’ into CHAR(0) would cause a character truncation, so: error!
A detail though: it fails because I’m using strict mode, which is the default starting from MySQL 5.7, and because strict mode hates truncation. My statement is a SELECT, and strict mode usually sends only warnings if a truncation happens during SELECT (whereas it sends errors during INSERT/UPDATE/DELETE), but we think it makes sense that SELECT become as strict as other statements in the future, and WITH RECURSIVE is one such case, so: error!
Back to our problem, I just have to make the column wider with a CAST and now it works:

WITH RECURSIVE
SELECT ‘0’ AS d UNION ALL SELECT ‘1’
strings AS
SELECT CAST(” AS CHAR(4)) AS s
UNION ALL
SELECT CONCAT(strings.s, digits.d)
FROM strings, digits
WHERE LENGTH(strings.s) < 4
SELECT * FROM strings WHERE LENGTH(s)=4;
 s 
16 rows in set (0,00 sec)

With the examples above, you now have material to “program” many numbergenerating or stringgenerating queries. In the next post, we will see how to use recursive queries to explore hierarchical data (XisachildofY).
And, as always:

select from_base64(‘VGhhbmsgeW91IGZvciB0cnlpbmcgcmVjdXJzaXZlIENURXMh’) as final_words;
+———————————————————+
 final_words 
+———————————————————+
 Thank you for trying recursive CTEs! 
+———————————————————+

Recent Comments