| Thread Previous • Date Previous • Date Next • Thread Next |
Hi Vicentiu,
This is a followup to the Wednesday discussion.
Background: see attached file for two queries. One uses a Recursive-CTE,
another uses CONNECT-BY syntax. They produce different results for some
reason. Initial guess about the cause was depth-first vs breadth-first
traversal.
Short answer: No it's not. The reason is that Recursive-CTE query is not
equivalent to the CONNECT-BY query.
Results from the recursive-CTE query:
ID N P CYCLE PATH
1 A 0 /1A
2 B A 0 /1A/2B
3 C A 0 /1A/3C
4 D B 0 /1A/2B/4D
5 D C 0 /1A/3C/5D
6 E D 0 /1A/3C/5D/6E
6 E D 0 /1A/2B/4D/6E
7 F E 0 /1A/2B/4D/6E/7F
7 F E 0 /1A/3C/5D/6E/7F
8 C F 0 /1A/3C/5D/6E/7F/8C
8 C F 0 /1A/2B/4D/6E/7F/8C
5 D C 0 /1A/2B/4D/6E/7F/8C/5D
12 rows selected.
Results from CONNECT-BY query:
ID N P CYCLE Path
1 A 0 /1A
2 B A 0 /1A/2B
4 D B 0 /1A/2B/4D
6 E D 0 /1A/2B/4D/6E
7 F E 0 /1A/2B/4D/6E/7F
8 C F 1 /1A/2B/4D/6E/7F/8C
3 C A 0 /1A/3C
5 D C 0 /1A/3C/5D
6 E D 0 /1A/3C/5D/6E
7 F E 1 /1A/3C/5D/6E/7F
10 rows selected.
The rows that are missing are:
ID N P CYCLE PATH
8 C F 0 /1A/3C/5D/6E/7F/8C
5 D C 0 /1A/2B/4D/6E/7F/8C/5D
(denote this as MISSING-ROWS)
Another thing to note is that the CTE query has returned CYCLE=0 for
all rows, while the CONNECT BY one has returned two rows with CYCLE=1:
8 C F 1 /1A/2B/4D/6E/7F/8C
7 F E 1 /1A/3C/5D/6E/7F
Note that these rows are parents of rows in MISSING-ROWS.
I don't think the issue is caused by breadth-first vs depth-first traversal
differences.
The issue is caused by us not understanding how Oracle's NOCYCLE clause works.
Take a look at the original-dataset.jpg, and PATH columns of MISSING-ROWS
We see that the path forms the entire loop (or "cycle").
Oracle's NOCYCLE seems to be about *not* constructing the entire loop.
To rule out breadth-first vs depth-first differences, let's make an example
where there's just one loop:
create table g3(a int, b int);
insert into g3 values (1, 2);
insert into g3 values (2, 3);
insert into g3 values (3, 4);
insert into g3 values (4, 5);
insert into g3 values (5, 6);
insert into g3 values (6, 7);
insert into g3 values (7, 4);
select
a, CONNECT_BY_ISCYCLE as cycle,
SYS_CONNECT_BY_PATH(a, '/') "Path"
from g3
start with a=1
connect by NOCYCLE prior b= a;
A CYCLE Path
1 0 /1
2 0 /1/2
3 0 /1/2/3
4 0 /1/2/3/4
5 0 /1/2/3/4/5
6 1 /1/2/3/4/5/6
The row with a=7 is not listed. If it did, the path would have had a complete
cycle.
It is a bit counter intuitive that /1/2/3/4/5/6 has CYCLE=1, while actually it
does not form a cycle. It's one step short of doing so.
The documentation is vague about this, but here's somebody thinking the same
on the internet:
http://www.sql.ru/forum/255418/zadachka-anagrammy-na-sql?hl=%e0%ed%e0%e3%f0%e0%ec%ec%e0#2291948
:
While posting messages on MetaLink, I think I finally got the logic of NOCYCLE.
Clue was right there in the name itself. NOCYCLE means do not cycle. So if you
have a hierarchical query without NOCYCLE and a cycle is detected Oracle does
not know how to handle it and throws ORA-01436. If NOCYCLE is used Oracle
simply cuts of part of a branch right at loop point without raising an error
(another word it simply does not follow the loop). And that explains why
CONNECT_BY_ISCYCLE is set to 1 for the last row in first iteration of the loop.
Otherwise, we would not get 1 at all. Now next logical step would be to
implement CYCLE.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Attachment:
orig-queries.sql
Description: application/sql
Attachment:
original-dataset.jpg
Description: JPEG image
| Thread Previous • Date Previous • Date Next • Thread Next |