← Back to team overview

maria-developers team mailing list archive

On Recursive CTE and CONNECT BY

 

Hi Vicentiu,

This is a followup to the Wednesday discussion.

Background: see attached file for two queries. One uses a Recursive-CTE,
another uses CONNECT-BY syntax. They produce different results for some
reason. Initial guess about the cause was depth-first vs breadth-first 
traversal.

Short answer: No it's not. The reason is that Recursive-CTE query is not 
equivalent to the CONNECT-BY query. 

Results from the recursive-CTE query:

        ID N P      CYCLE PATH
         1 A            0 /1A
         2 B A          0 /1A/2B
         3 C A          0 /1A/3C
         4 D B          0 /1A/2B/4D
         5 D C          0 /1A/3C/5D 
         6 E D          0 /1A/3C/5D/6E 
         6 E D          0 /1A/2B/4D/6E 
         7 F E          0 /1A/2B/4D/6E/7F 
         7 F E          0 /1A/3C/5D/6E/7F
         8 C F          0 /1A/3C/5D/6E/7F/8C
         8 C F          0 /1A/2B/4D/6E/7F/8C
         5 D C          0 /1A/2B/4D/6E/7F/8C/5D
12 rows selected.

Results from CONNECT-BY query:

        ID N P      CYCLE Path
         1 A            0 /1A
         2 B A          0 /1A/2B 
         4 D B          0 /1A/2B/4D 
         6 E D          0 /1A/2B/4D/6E
         7 F E          0 /1A/2B/4D/6E/7F
         8 C F          1 /1A/2B/4D/6E/7F/8C
         3 C A          0 /1A/3C
         5 D C          0 /1A/3C/5D
         6 E D          0 /1A/3C/5D/6E
         7 F E          1 /1A/3C/5D/6E/7F
10 rows selected.

The rows that are missing are:

        ID N P      CYCLE PATH
         8 C F          0 /1A/3C/5D/6E/7F/8C
         5 D C          0 /1A/2B/4D/6E/7F/8C/5D
(denote this as MISSING-ROWS)

Another thing to note is that the CTE query has returned CYCLE=0 for
all rows, while the CONNECT BY one has returned two rows with CYCLE=1:

         8 C F          1 /1A/2B/4D/6E/7F/8C
         7 F E          1 /1A/3C/5D/6E/7F

Note that these rows are parents of rows in MISSING-ROWS.

I don't think the issue is caused by breadth-first vs depth-first traversal
differences.

The issue is caused by us not understanding how Oracle's NOCYCLE clause works.

Take a look at the original-dataset.jpg, and PATH columns of MISSING-ROWS 
We see that the path forms the entire loop (or "cycle").

Oracle's NOCYCLE seems to be about *not* constructing the entire loop.

To rule out breadth-first vs depth-first differences, let's make an example
where there's just one loop:

create table g3(a int, b int);

insert into g3 values (1, 2);
insert into g3 values (2, 3);
insert into g3 values (3, 4);
insert into g3 values (4, 5);
insert into g3 values (5, 6);
insert into g3 values (6, 7);
insert into g3 values (7, 4);

select 
  a, CONNECT_BY_ISCYCLE as cycle, 
  SYS_CONNECT_BY_PATH(a, '/') "Path" 
from g3
  start with a=1 
  connect by NOCYCLE prior b= a;

         A      CYCLE Path
         1          0 /1 
         2          0 /1/2 
         3          0 /1/2/3 
         4          0 /1/2/3/4 
         5          0 /1/2/3/4/5
         6          1 /1/2/3/4/5/6

The row with a=7 is not listed. If it did, the path would have had a complete
cycle.

It is a bit counter intuitive that /1/2/3/4/5/6 has CYCLE=1, while actually it
does not form a cycle. It's one step short of doing so.

The documentation is vague about this, but here's somebody thinking the same
on the internet:

http://www.sql.ru/forum/255418/zadachka-anagrammy-na-sql?hl=%e0%ed%e0%e3%f0%e0%ec%ec%e0#2291948
:
While posting messages on MetaLink, I think I finally got the logic of NOCYCLE.
Clue was right there in the name itself. NOCYCLE means do not cycle. So if you
have a hierarchical query without NOCYCLE and a cycle is detected Oracle does
not know how to handle it and throws ORA-01436. If NOCYCLE is used Oracle
simply cuts of part of a branch right at loop point without raising an error
(another word it simply does not follow the loop). And that explains why
CONNECT_BY_ISCYCLE is set to 1 for the last row in first iteration of the loop.
Otherwise, we would not get 1 at all. Now next logical step would be to
implement CYCLE.

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog


Attachment: orig-queries.sql
Description: application/sql

Attachment: original-dataset.jpg
Description: JPEG image