← Back to team overview

maria-developers team mailing list archive

WL#11 Updated (by Sergei): Connect by

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Connect by
CREATION DATE..: Thu, 26 Mar 2009, 00:30
SUPERVISOR.....: Monty
IMPLEMENTOR....: 
COPIES TO......: 
CATEGORY.......: Server-BackLog
TASK ID........: 11 (http://askmonty.org/worklog/?tid=11)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 220 (hours remain)
ORIG. ESTIMATE.: 220

PROGRESS NOTES:

-=-=(Sergei - Tue, 29 Jun 2010, 14:03)=-=-
Category updated.
--- /tmp/wklog.11.old.32063     2010-06-29 14:03:35.000000000 +0000
+++ /tmp/wklog.11.new.32063     2010-06-29 14:03:35.000000000 +0000
@@ -1 +1 @@
-Server-Sprint
+Server-BackLog

-=-=(Guest - Tue, 19 May 2009, 18:27)=-=-
High Level Description modified.
--- /tmp/wklog.11.old.21953     2009-05-19 18:27:14.000000000 +0300
+++ /tmp/wklog.11.new.21953     2009-05-19 18:27:14.000000000 +0300
@@ -1 +1,360 @@
-Add CONNECT BY syntax
+<contents>
+1. Background information
+2. CONNECT BY semantics, properties and limitations
+2.1 Additional CONNECT BY features
+2.2 Limitations
+3. Our implementation
+3.1 Scope Questions
+3.2 CONNECT BY execution
+3.2.1 Straightforward (recursive) evaluation algorithm
+3.2.2 Transitive-closure evaluation algorithms 
+3.2.3 Other algorithms 
+3.2.4 Loop detection
+3.2.4.1 The upper bound of produced records
+3.2.4.1 Straightforward approach: track chains
+3.2.3 Improvements for straightforward execution strategy
+3.3. Optimization
+4. Use-cases dump
+</contents>
+
+1. Background information
+-------------------------
+* CONNECT BY is a non-standard, Oracle's syntax. It is also supported by 
+  EnterpriseDB (Q: any other implementations?)
+
+* PostgreSQL 8.4 (now beta) has support for SQL-standard compliant WITH
+  RECURSIVE (aka Common Table Expressions, CTE) query syntax:
+  http://www.postgresql.org/docs/8.4/static/queries-with.html
+  http://www.postgresql.org/about/news.1074
+  http://archives.postgresql.org/pgsql-hackers/2008-02/msg00642.php
+  http://archives.postgresql.org/pgsql-patches/2008-05/msg00362.php
+
+* Evgen's attempt:
+  http://lists.mysql.com/internals/15569
+
+DB2 and MS SQL support SQL standard's WITH RECURSIVE clause.
+
+2. CONNECT BY semantics, properties and limitations
+---------------------------------------------------
+From Oracle's manual:
+
+<almost-quote>
+
+  SELECT ... 
+  FROM ...
+  WHERE ...
+  START WITH cond
+  CONNECT BY connect_cond 
+  ORDER [SIBLINGS] BY  
+
+In oracle, one expression in connect_cond must be 
+  
+  PRIOR expr = expr
+ 
+ or
+
+  expr = PRIOR expr
+
+The manner in which Oracle processes a WHERE clause (if any) in a hierarchical
+query depends on whether the WHERE clause contains a join:
+
+ * If the WHERE predicate contains a join, Oracle applies the join predicates
+   before doing the CONNECT BY processing.
+ * If the WHERE clause does not contain a join, Oracle applies all predicates 
+   other than the CONNECT BY predicates after doing the CONNECT BY processing
+   without affecting the other rows of the hierarchy.
+</almost-quote>
+
+See http://www.adp-gmbh.ch/ora/sql/connect_by.html
+http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96540/queries4a.htm
+
+
+2.1 Additional CONNECT BY features
+----------------------------------
+
+LEVEL pseudocolumn
+  indicates ancestry depth of the record (inital row has level=1, its children
+  have level=2 and so forth).  Can be used in CONNECT BY clause to limit
+  traversal depth.
+
+SYS_CONNECT_BY_PATH(column, 'char')
+  returns path from root to the node.
+
+NOCYCLE and CONNECT_BY_ISCYCLE
+  "With the 10g keyword NOCYCLE, hierarchical queries detect loops and do not
+  generate errors. CONNECT_BY_ISCYCLE pseudo-column is a flag that can be used
+  to detect which row is cycling"
+  http://www.dba-oracle.com/t_advanced_sql_connect_by_loop.htm
+
+ORDER SIBLINGS BY
+  CONNECT BY produces records in "children follow parents" order, with order
+  of the siblings unspecified. ORDER SIBLINGS BY orders siblings within each
+  "generation".
+
+2.2 Limitations
+---------------
+Other limitations (which we might or might not want to replicate)
+
+* There is this error:
+  ORA-01437:   cannot have join with CONNECT BY
+  Cause:       A join operation was specified with a CONNECT BY clause. If a 
+               CONNECT BY clause is used in a SELECT statement for a tree-
+               structured query, only one table may be referenced in the query.
+  Action:      Remove either the CONNECT BY clause or the join operation from 
+               the SQL statement.
+  It seems oracle had this limitation before version 10G 
+
+* LEVEL cannot be used on the left side of IN-comparison if the right side is a
+  subquery
+http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/sql_elements6a.htm#9547
+  This seems to have been lifted in version 10?
+
+3. Our implementation
+---------------------
+
+3.1 Scope Questions
+-------------------
+* Are we sure we want CONNECT BY syntax and not SQL standard' one? (I'm not
+  suggesting one or the other, just want to make sure we've made a conscious 
+  decision)
+
+* Any use-cases we need to make sure to handle well?
+
+Will we implement any of these features:
+
+* Output is ordered (children follow parents)
+* "ORDER SIBLINGS BY" variant of ORDER BY
+* NOCYCLE/CONNECT_BY_ISCYCLE
+ - It seems any checking for cycles will cause overhead. Do we implement a
+   mode for those who know what they are doing, where the server doesn't
+   actually check cycles but only reports error if it happened to enumerate, 
+   say MAX(1M, #records_in_table * 10) records? (This doesn't guarantee that
+   there are no cycles, but this is just beyond what one could logically want)
+
+* Oracle's treatment of WHERE (if there's a join - the WHERE is applied after
+  connect by, otherwise before) [Yes]
+* Can one use SYS_CONNECT_BY_PATH in the CONNECT BY expression?
+
+
+3.2 CONNECT BY execution
+------------------------
+
+3.2.1 Straightforward (recursive) evaluation algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As specified in CONNECT BY definition, breadth-first, parent-to-children 
+traversal:
+
+  start with a scan that retrieves records using the START WITH condition;
+  pass rows to ouptut and also record them (i.e. needed columns) in 
+    some sort of growable, overflow-to-disk buffer in_buf;
+  
+  while(in_buf is not empty)
+  {
+    for each record in the buffer
+    {
+      do a scan based on CONNECT BY condition;
+      pass rows to output and also record them (i.e. needed columns) in 
+        a growable, overflow-to-disk buffer out_buf; 
+    }
+    in_buf= out_buf;
+  }
+
+This algorithm will produce rows in the required order.
+
+3.2.2 Transitive-closure evaluation algorithms 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When CONNECT BY clause refers only to current and PRIOR records (and doesn't
+refer to connect path using LEVEL or SYS_CONNECT_BY_PATH functions), then
+evaluation of CONNECT BY operation is equivalent to building a transitive
+closure of a certain relation.
+
+TODO: can we use LEVEL/SYS_CONNECT_BY_PATH in select list with these
+      algorithms? looks like no?
+
+There are special algorithms to build transitive closure of relation that is
+represented as a table of edges, e.g. Blocked Warshall Algorithm.
+
+Q: Do we investigate further in this direction?
+
+3.2.3 Other algorithms 
+----------------------
+To be resolved: Do we always start from the first clause and go to children?
+Does it make sense to proceed in other direction, from children to parents?
+Looks like no? TODO need definite answer.
+
+3.2.4 Loop detection
+~~~~~~~~~~~~~~~~~~~~
+Transitive-closure algorithms can detect loops (it seems some of them can also
+handle loop avoidance but that needs to be verified).
+
+Straightforward-evaluation algorithm will work forever if there is a loop,
+hence will need assistance in loop detection/avoidance. 
+
+3.2.4.1 The upper bound of produced records
+-------------------------------------------
+There is an upper bound of the amount of records CONNECT BY runtime can
+generate without generating a loop.
+
+The worst case is when
+ * every record in a source table was in the parent generation (and thus has
+   started a parent->child->child->...  chain)
+ * every chain is of #table-records length.
+
+example of such case:
+
+  SELECT * FROM employees 
+  START WITH true
+  CONNECT BY 
+    PRIOR emp_id = (emp_id + 1) MOD $n_employees  AND 
+     length(SYS_CONNECT_BY_PATH('-')) = $n_employees -- guard againist 
+                                                     -- forming loops
+
+this gives that we can at most generate O(#table_records^2) records. This
+limitation can be used as a primitive way to stop evaluation.
+
+
+3.2.4.1 Straightforward approach: track chains
+----------------------------------------------
+In general case, we will have to track which records we have seen across each
+of the parent-child chains. The same record can show up in different chains 
+at different times and this won't form a loop:
+
+   parent  generation1  generation2 
+
+   row1- --+---row2----  ---row3--   (chain1)
+           |
+            \--row3-+--  ---row2--   (chain2)
+                    |
+                     \-  ---row4--   (chain3)
+   row4-  ...
+
+Tracking can be done by 
+- Numbering the chains and using one structure (e.g temptable) to store 
+  (rowid, chain#) pairs and check them for uniqueness.
+
+- Using per-chain data structure which we could serialize/deserialize. This
+  could be 
+   - serializable hashtable
+   - ordered rowid list
+   - serializable sparse bitmap
+
+One can expect a lot of chains to have common starts (eg. look at chain2 and 
+chain3). I don't see how one could take advantage of that, though.
+
+3.2.3 Improvements for straightforward execution strategy
+---------------------------------------------------------
+
+* If the query is a join, it may make sense to materialize it join result
+  (including creation of appropriate index) so we're able to make
+  parent-to-child transitions faster. 
+  This seems to be connected to Evgen's work on FROM subqueries.
+
+* If there is a suitable index, we can employ a variant of BatchedKeyAccess.
+
+* Part of CONNECT BY expression that places restrictions on subsequent
+  generation can be moved to the WHERE. If we do that, we get two recordsets:
+
+1. Initial START WITH recordset
+
+2. A recordset to be used to advance to subsequent generation
+
+
+3.3. Optimization
+-----------------
+It seems it is nearly impossible to estimate how many iterations we'll have
+to make and how many records we will end up producing.
+
+TODO: some bad estimates.. assume a fixed number of generations, reuse ref
+accces estimations for fanount, which gives
+
+  access_method_estimate ^ number_of_generations  
+
+estimate? 
+
+4. Use-cases dump
+=================
+
+http://www.orafaq.com/usenet/comp.databases.oracle.server/2007/01/05/0264.htm:
+   select mak_xx,nr_porz,level lvl from spacer_strona
+       where nvl(dervlvl,0)<3
+       start with mak_xx=125414 and nr_porz=0
+       connect by mak_xx = prior derv_mak_xx and nr_porz = prior derv_nr_porz
+   and prior dervlvl=3
+
+
+http://www.orafaq.com/usenet/comp.databases.oracle.server/2007/01/04/0196.htm:
+  SELECT OPM_N_ID FROM RAFC_ADM.AFC_T_OPERATION_METIER START WITH OPM_N_ID IN
+                  (
+                           SELECT OPM_N_ID FROM RAFC_ADM.AFC_T_OPERATION_METIER x
+                           START WITH x.OPM_N_ID IN (4846)
+                          CONNECT BY ((PRIOR x.OPM_MERE_OPM_N_ID = x.OPM_N_ID)
+                          OR (PRIOR x.OPM_ANNULEE_OPM_N_ID = x.OPM_N_ID))
+                  )
+  CONNECT BY ((PRIOR OPM_N_ID = OPM_MERE_OPM_N_ID) OR (PRIOR OPM_N_ID =
+OPM_ANNULEE_OPM_N_ID))
+
+http://forums.enterprisedb.com/posts/list/737.page:
+  select lpad(' ',2*(level-1)) || to_char(child) s
+  from x
+  start with parent is null
+  connect by prior child = parent;
+
+  select *
+  from emp, dept
+  where dept.deptno = emp.deptno
+  start with mgr is null
+  connect by mgr = prior empno
+
+http://forums.oracle.com/forums/thread.jspa?threadID=623173: 
+  SELECT cust_number
+  FROM customer
+  START WITH cust_number = '5568677999'
+  CONNECT BY PRIOR cust_number = cust_group_code.
+
+http://www.orafaq.com/forum/t/118879/0/
+  SELECT COUNT(a.dataid), c.name 
+  FROM dauditnew a, dtree b, kuaf c 
+  WHERE a.auditdate > SYSDATE-10 AND a.auditstr IN ('Create', 'AddVersion') 
+  AND a.dataid = b.dataid AND c.id = a.performerid  
+  AND a.SUBTYPE = 0 
+  START WITH b.dataid = 6132086 CONNECT BY PRIOR a.dataid = b.parentid GROUP BY
+c.name
+
+
+http://www.postgresql-support.de/blog/blog_hans.html
+  SELECT METIER_ID||'|'||ORGANISATION_ID AS JOBORG
+  FROM INTRA_METIER,INTRA_ORGANISATION
+  WHERE METIER_ID IN(
+         SELECT METIER_ID
+         FROM INTRA_METIER
+         START WITH METIER_ID= '99533220-e8b2-4121-998c-808ea8ca2da7'
+         CONNECT BY METIER_ID= PRIOR PARENT_METIER_ID
+       ) AND ORGANISATION_ID IN (
+         SELECT ORGANISATION_ID
+         FROM INTRA_ORGANISATION
+         START WITH ORGANISATION_ID='025ee58f-35a3-4183-8679-01472838f753'
+         CONNECT BY ORGANISATION_ID= PRIOR PARENT_ORGANISATION_ID
+      );
+
+http://oracle.com
+  Oracle database uses CONNECT BY to generate EXPLAINs.
+
+http://practical-sql-tuning.blogspot.com/2009/01/use-of-statistically-incorrect.html
+
+  select sum(human_cnt) from facts
+   where territory_id in (select territory_id
+                            from dic$territory
+                           start with territory_code = :code
+                         connect by prior territory_id = territory_parent);
+
+http://www.dbasupport.com/forums/archive/index.php/t-30008.html
+
+
+  SELECT LEVEL,LPAD(' ',8*(LEVEL-1))||T_COM_OBJ.OBJ_NAME, T_COM_OBJ.OBJ_PARENT,
+T_COM_OBJ.OBJ_ID
+  FROM VDR.T_COM_OBJ
+  START WITH T_COM_OBJ.OBJ_ID in (select obj_id obj_main from vdr.t_com_obj
+where obj_id=obj_parent)
+  CONNECT BY PRIOR T_COM_OBJ.OBJ_ID = T_COM_OBJ.OBJ_PARENT
+
+



DESCRIPTION:

<contents>
1. Background information
2. CONNECT BY semantics, properties and limitations
2.1 Additional CONNECT BY features
2.2 Limitations
3. Our implementation
3.1 Scope Questions
3.2 CONNECT BY execution
3.2.1 Straightforward (recursive) evaluation algorithm
3.2.2 Transitive-closure evaluation algorithms 
3.2.3 Other algorithms 
3.2.4 Loop detection
3.2.4.1 The upper bound of produced records
3.2.4.1 Straightforward approach: track chains
3.2.3 Improvements for straightforward execution strategy
3.3. Optimization
4. Use-cases dump
</contents>

1. Background information
-------------------------
* CONNECT BY is a non-standard, Oracle's syntax. It is also supported by 
  EnterpriseDB (Q: any other implementations?)

* PostgreSQL 8.4 (now beta) has support for SQL-standard compliant WITH
  RECURSIVE (aka Common Table Expressions, CTE) query syntax:
  http://www.postgresql.org/docs/8.4/static/queries-with.html
  http://www.postgresql.org/about/news.1074
  http://archives.postgresql.org/pgsql-hackers/2008-02/msg00642.php
  http://archives.postgresql.org/pgsql-patches/2008-05/msg00362.php

* Evgen's attempt:
  http://lists.mysql.com/internals/15569

DB2 and MS SQL support SQL standard's WITH RECURSIVE clause.

2. CONNECT BY semantics, properties and limitations
---------------------------------------------------
>From Oracle's manual:

<almost-quote>

  SELECT ... 
  FROM ...
  WHERE ...
  START WITH cond
  CONNECT BY connect_cond 
  ORDER [SIBLINGS] BY  

In oracle, one expression in connect_cond must be 
  
  PRIOR expr = expr
 
 or

  expr = PRIOR expr

The manner in which Oracle processes a WHERE clause (if any) in a hierarchical
query depends on whether the WHERE clause contains a join:

 * If the WHERE predicate contains a join, Oracle applies the join predicates
   before doing the CONNECT BY processing.
 * If the WHERE clause does not contain a join, Oracle applies all predicates 
   other than the CONNECT BY predicates after doing the CONNECT BY processing
   without affecting the other rows of the hierarchy.
</almost-quote>

See http://www.adp-gmbh.ch/ora/sql/connect_by.html
http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96540/queries4a.htm


2.1 Additional CONNECT BY features
----------------------------------

LEVEL pseudocolumn
  indicates ancestry depth of the record (inital row has level=1, its children
  have level=2 and so forth).  Can be used in CONNECT BY clause to limit
  traversal depth.

SYS_CONNECT_BY_PATH(column, 'char')
  returns path from root to the node.

NOCYCLE and CONNECT_BY_ISCYCLE
  "With the 10g keyword NOCYCLE, hierarchical queries detect loops and do not
  generate errors. CONNECT_BY_ISCYCLE pseudo-column is a flag that can be used
  to detect which row is cycling"
  http://www.dba-oracle.com/t_advanced_sql_connect_by_loop.htm

ORDER SIBLINGS BY
  CONNECT BY produces records in "children follow parents" order, with order
  of the siblings unspecified. ORDER SIBLINGS BY orders siblings within each
  "generation".

2.2 Limitations
---------------
Other limitations (which we might or might not want to replicate)

* There is this error:
  ORA-01437:   cannot have join with CONNECT BY
  Cause:       A join operation was specified with a CONNECT BY clause. If a 
               CONNECT BY clause is used in a SELECT statement for a tree-
               structured query, only one table may be referenced in the query.
  Action:      Remove either the CONNECT BY clause or the join operation from 
               the SQL statement.
  It seems oracle had this limitation before version 10G 

* LEVEL cannot be used on the left side of IN-comparison if the right side is a
  subquery
http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/sql_elements6a.htm#9547
  This seems to have been lifted in version 10?

3. Our implementation
---------------------

3.1 Scope Questions
-------------------
* Are we sure we want CONNECT BY syntax and not SQL standard' one? (I'm not
  suggesting one or the other, just want to make sure we've made a conscious 
  decision)

* Any use-cases we need to make sure to handle well?

Will we implement any of these features:

* Output is ordered (children follow parents)
* "ORDER SIBLINGS BY" variant of ORDER BY
* NOCYCLE/CONNECT_BY_ISCYCLE
 - It seems any checking for cycles will cause overhead. Do we implement a
   mode for those who know what they are doing, where the server doesn't
   actually check cycles but only reports error if it happened to enumerate, 
   say MAX(1M, #records_in_table * 10) records? (This doesn't guarantee that
   there are no cycles, but this is just beyond what one could logically want)

* Oracle's treatment of WHERE (if there's a join - the WHERE is applied after
  connect by, otherwise before) [Yes]
* Can one use SYS_CONNECT_BY_PATH in the CONNECT BY expression?


3.2 CONNECT BY execution
------------------------

3.2.1 Straightforward (recursive) evaluation algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As specified in CONNECT BY definition, breadth-first, parent-to-children 
traversal:

  start with a scan that retrieves records using the START WITH condition;
  pass rows to ouptut and also record them (i.e. needed columns) in 
    some sort of growable, overflow-to-disk buffer in_buf;
  
  while(in_buf is not empty)
  {
    for each record in the buffer
    {
      do a scan based on CONNECT BY condition;
      pass rows to output and also record them (i.e. needed columns) in 
        a growable, overflow-to-disk buffer out_buf; 
    }
    in_buf= out_buf;
  }

This algorithm will produce rows in the required order.

3.2.2 Transitive-closure evaluation algorithms 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When CONNECT BY clause refers only to current and PRIOR records (and doesn't
refer to connect path using LEVEL or SYS_CONNECT_BY_PATH functions), then
evaluation of CONNECT BY operation is equivalent to building a transitive
closure of a certain relation.

TODO: can we use LEVEL/SYS_CONNECT_BY_PATH in select list with these
      algorithms? looks like no?

There are special algorithms to build transitive closure of relation that is
represented as a table of edges, e.g. Blocked Warshall Algorithm.

Q: Do we investigate further in this direction?

3.2.3 Other algorithms 
----------------------
To be resolved: Do we always start from the first clause and go to children?
Does it make sense to proceed in other direction, from children to parents?
Looks like no? TODO need definite answer.

3.2.4 Loop detection
~~~~~~~~~~~~~~~~~~~~
Transitive-closure algorithms can detect loops (it seems some of them can also
handle loop avoidance but that needs to be verified).

Straightforward-evaluation algorithm will work forever if there is a loop,
hence will need assistance in loop detection/avoidance. 

3.2.4.1 The upper bound of produced records
-------------------------------------------
There is an upper bound of the amount of records CONNECT BY runtime can
generate without generating a loop.

The worst case is when
 * every record in a source table was in the parent generation (and thus has
   started a parent->child->child->...  chain)
 * every chain is of #table-records length.

example of such case:

  SELECT * FROM employees 
  START WITH true
  CONNECT BY 
    PRIOR emp_id = (emp_id + 1) MOD $n_employees  AND 
     length(SYS_CONNECT_BY_PATH('-')) = $n_employees -- guard againist 
                                                     -- forming loops

this gives that we can at most generate O(#table_records^2) records. This
limitation can be used as a primitive way to stop evaluation.


3.2.4.1 Straightforward approach: track chains
----------------------------------------------
In general case, we will have to track which records we have seen across each
of the parent-child chains. The same record can show up in different chains 
at different times and this won't form a loop:

   parent  generation1  generation2 

   row1- --+---row2----  ---row3--   (chain1)
           |
            \--row3-+--  ---row2--   (chain2)
                    |
                     \-  ---row4--   (chain3)
   row4-  ...

Tracking can be done by 
- Numbering the chains and using one structure (e.g temptable) to store 
  (rowid, chain#) pairs and check them for uniqueness.

- Using per-chain data structure which we could serialize/deserialize. This
  could be 
   - serializable hashtable
   - ordered rowid list
   - serializable sparse bitmap

One can expect a lot of chains to have common starts (eg. look at chain2 and 
chain3). I don't see how one could take advantage of that, though.

3.2.3 Improvements for straightforward execution strategy
---------------------------------------------------------

* If the query is a join, it may make sense to materialize it join result
  (including creation of appropriate index) so we're able to make
  parent-to-child transitions faster. 
  This seems to be connected to Evgen's work on FROM subqueries.

* If there is a suitable index, we can employ a variant of BatchedKeyAccess.

* Part of CONNECT BY expression that places restrictions on subsequent
  generation can be moved to the WHERE. If we do that, we get two recordsets:

1. Initial START WITH recordset

2. A recordset to be used to advance to subsequent generation


3.3. Optimization
-----------------
It seems it is nearly impossible to estimate how many iterations we'll have
to make and how many records we will end up producing.

TODO: some bad estimates.. assume a fixed number of generations, reuse ref
accces estimations for fanount, which gives

  access_method_estimate ^ number_of_generations  

estimate? 

4. Use-cases dump
=================

http://www.orafaq.com/usenet/comp.databases.oracle.server/2007/01/05/0264.htm:
   select mak_xx,nr_porz,level lvl from spacer_strona
       where nvl(dervlvl,0)<3
       start with mak_xx=125414 and nr_porz=0
       connect by mak_xx = prior derv_mak_xx and nr_porz = prior derv_nr_porz
   and prior dervlvl=3


http://www.orafaq.com/usenet/comp.databases.oracle.server/2007/01/04/0196.htm:
  SELECT OPM_N_ID FROM RAFC_ADM.AFC_T_OPERATION_METIER START WITH OPM_N_ID IN
                  (
                           SELECT OPM_N_ID FROM RAFC_ADM.AFC_T_OPERATION_METIER x
                           START WITH x.OPM_N_ID IN (4846)
                          CONNECT BY ((PRIOR x.OPM_MERE_OPM_N_ID = x.OPM_N_ID)
                          OR (PRIOR x.OPM_ANNULEE_OPM_N_ID = x.OPM_N_ID))
                  )
  CONNECT BY ((PRIOR OPM_N_ID = OPM_MERE_OPM_N_ID) OR (PRIOR OPM_N_ID =
OPM_ANNULEE_OPM_N_ID))

http://forums.enterprisedb.com/posts/list/737.page:
  select lpad(' ',2*(level-1)) || to_char(child) s
  from x
  start with parent is null
  connect by prior child = parent;

  select *
  from emp, dept
  where dept.deptno = emp.deptno
  start with mgr is null
  connect by mgr = prior empno

http://forums.oracle.com/forums/thread.jspa?threadID=623173: 
  SELECT cust_number
  FROM customer
  START WITH cust_number = '5568677999'
  CONNECT BY PRIOR cust_number = cust_group_code.

http://www.orafaq.com/forum/t/118879/0/
  SELECT COUNT(a.dataid), c.name 
  FROM dauditnew a, dtree b, kuaf c 
  WHERE a.auditdate > SYSDATE-10 AND a.auditstr IN ('Create', 'AddVersion') 
  AND a.dataid = b.dataid AND c.id = a.performerid  
  AND a.SUBTYPE = 0 
  START WITH b.dataid = 6132086 CONNECT BY PRIOR a.dataid = b.parentid GROUP BY
c.name


http://www.postgresql-support.de/blog/blog_hans.html
  SELECT METIER_ID||'|'||ORGANISATION_ID AS JOBORG
  FROM INTRA_METIER,INTRA_ORGANISATION
  WHERE METIER_ID IN(
         SELECT METIER_ID
         FROM INTRA_METIER
         START WITH METIER_ID= '99533220-e8b2-4121-998c-808ea8ca2da7'
         CONNECT BY METIER_ID= PRIOR PARENT_METIER_ID
       ) AND ORGANISATION_ID IN (
         SELECT ORGANISATION_ID
         FROM INTRA_ORGANISATION
         START WITH ORGANISATION_ID='025ee58f-35a3-4183-8679-01472838f753'
         CONNECT BY ORGANISATION_ID= PRIOR PARENT_ORGANISATION_ID
      );

http://oracle.com
  Oracle database uses CONNECT BY to generate EXPLAINs.

http://practical-sql-tuning.blogspot.com/2009/01/use-of-statistically-incorrect.html

  select sum(human_cnt) from facts
   where territory_id in (select territory_id
                            from dic$territory
                           start with territory_code = :code
                         connect by prior territory_id = territory_parent);

http://www.dbasupport.com/forums/archive/index.php/t-30008.html


  SELECT LEVEL,LPAD(' ',8*(LEVEL-1))||T_COM_OBJ.OBJ_NAME, T_COM_OBJ.OBJ_PARENT,
T_COM_OBJ.OBJ_ID
  FROM VDR.T_COM_OBJ
  START WITH T_COM_OBJ.OBJ_ID in (select obj_id obj_main from vdr.t_com_obj
where obj_id=obj_parent)
  CONNECT BY PRIOR T_COM_OBJ.OBJ_ID = T_COM_OBJ.OBJ_PARENT



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v4.0.0)