Oracle's hierarchical query and new 10g enhancements
By John Adolph Palinski
The Select statement has some additional clauses that are used to place a result set into a hierarchy. A good example of a hierarchy is the Windows My Computer dialog window. When the window is opened, the various desktop directories and drives are displayed. Under each drive is an offset of directories that reside on the drive. Further, expanding the directory causes the display of child files and other directories. This is a hierarchy, each object node has its children, grandchildren, and great-grandchildren displayed underneath. Oracle’s Select statement has a clause that allows you to identify the level of a row (parent, child, and grandchild) and display the related row under the proper parent. This clause is the Start With … Connect By clause. The Start With section identifies the first parent in the hierarchy. The Connect By section tells Oracle how to link each parent to the child or next level record (if it exists).
Listing 1 illustrates a hierarchical query. The listing uses a default Oracle table called EMP that contains a unary relationship. The EMP table contains employee records. Each EMP record has an MGR field that contains another employee ID as a value. The MGR value is a foreign key to the EMPNO column on the same table and identifies the current employee’s boss.
Listing 1 is depiction found in many reference manuals and was not developed by the author. The listing displays employees and the employees that report to the respective employee. The employee records are displayed in a hierarchy. At the top of the hierarchy is employee KING. KING’s record is the first level under the hierarchy. At the second level are employees JONES, BLAKE, and CLARK. Under each of the second level employees are employees that report to these individuals. This is a hierarchy in that the proper employee is placed under the proper boss.
The start of the hierarchy is identified in the Start With clause section. Records with a null MGR value are placed in the first level. Oracle then evaluates other records looking for MGR values that match the prior or parent rows EMP. In the case of KING, the MGR value is null. KING’s employee number value is the PRIOR EMPNO value. Employees JONES, BLAKE, and CLARK are the employees that had KING’s EMPNO value in their MGR column. Oracle then moves through the data looking for employees that have MGR equal to JONE, BLAKE, or CLARK’s EMPNO value.
Listing 1 – Listing employees in a hierarchy using the Start With … Connect By clause
SQL> column employee format a25
SQL> Select level, lpad(' ',2*(level - 1)) || ename "EMPLOYEE",
2 empno, mgr
3 from emp
4 start with mgr is null
5 connect by prior empno = mgr
6*
LEVEL EMPLOYEE EMPNO MGR
---------- ------------------------- ---------- ----------
1 KING 7839
2 JONES 7566 7839
3 SCOTT 7788 7566
4 ADAMS 7876 7788
3 FORD 7902 7566
4 SMITH 7369 7902
2 BLAKE 7698 7839
3 ALLEN 7499 7698
3 WARD 7521 7698
3 MARTIN 7654 7698
3 TURNER 7844 7698
3 JAMES 7900 7698
2 CLARK 7782 7839
3 MILLER 7934 7782
14 rows selected.
SQL>
Notice the Listing 1 Level expression. It is a pseudo-column used to identify the particular hierarchy level. The Prior keyword indicates how the current row is related to the parent row. The previous query is representive of the pre-Oracle10g hierarchical query functionality. The functionality of the Hierarchical query is enhanced in Oracle10g. Oracle has added some additional operators, pseudo-columns, functions, and keywords. These are:
• Connect_by_root - Operator that returns a value from the current column’s root row.
• Connect_by_iscycle - Pseudo-column that identifies cycles. When used in conjunction to the Nocycle keyword, the column returns a value of 1 if the current row has a child that is also a parent. Connect_by_iscycle returns a value of 0 if the current row does not have a child that is a parent.
• Connect_by_isleaf - Pseudo-column that identifies a leaf node. A leaf node is the lowest record in the hierarchy. It is a record that does not have a child.
• Sys_connect_by_path - Oracle9i function that returns a concatenated value that has each of the related row values. The values can be separated by a character string.
• Nocycle - Keyword placed in the Connect By section that disables the error generated when the target data has a cycle. A cycle occurs when a parent record is also a child record.
• Siblings - Oracle9i keyword used in the Order By clause to properly order the rows. The keyword causes rows to be sorted within their particular hierarchy. This is something that cannot be done using the normal Order by clause.
Listing 2 illustrates the Sys_connect_by_path function. The listing is a modification of Listing 1. The Listing 1 expressions are replaced by the Sys_connect_by_path() function. The function has two parameters: the employee names (ENAME) and the character (-) separating the employee names. Notice that each row now displays the full ancestral hierarchy rather than an individual employee name. Each of the employee’s names is separated by the function character.
Listing 2 – Using the Sys_connect_by_path() function to display the full ancestral hierarchy on each row
SQL> Select sys_connect_by_path(ename, '-') hierarchy
2 from emp
3 start with mgr is null
4* connect by prior empno = mgr
HIERARCHY
--------------------------------
-KING
-KING-JONES
-KING-JONES-SCOTT
-KING-JONES-SCOTT-ADAMS
-KING-JONES-FORD
-KING-JONES-FORD-SMITH
-KING-BLAKE
-KING-BLAKE-ALLEN
-KING-BLAKE-WARD
-KING-BLAKE-MARTIN
-KING-BLAKE-TURNER
-KING-BLAKE-JAMES
-KING-CLARK
-KING-CLARK-MILLER
14 rows selected.
SQL>
Listing 3 is also a revision of Listing 1. This listing illustrates the Sibling Order By clause option. Return to Listing 1 and notice that the child records are listed under the proper parent, but not order in any particular order. Adding the Order By clause to the Select statement and sorting the result set by one of the expressions causes the result set to lose its hierarchy. Ordering by Level causes all records to be sorted by Level value. You will no longer know which hierarchy the particular row belongs to. This same effect occurs if Employee, Empno, or Mgr is used. As Listing 3 illustrates, using the Siblings keyword in the Order By clause causes each hierarchy to be properly sorted within the hierarchy. The Siblings keyword replacing the By keyword. Notice, the clause must still identify the result set value that determines the sorting within the hierarchy levels Compare the result set to Listing 1.
Listing 3 – Using the Siblings keyword to sort rows within each hierarchy
SQL> Select level, lpad(' ',2*(level - 1)) || ename "EMPLOYEE",
2 empno, mgr
3 from emp
4 start with mgr is null
5 connect by prior empno = mgr
6* order siblings by ename
LEVEL EMPLOYEE EMPNO MGR
---------- ------------------------- ---------- ----------
1 KING 7839
2 BLAKE 7698 7839
3 ALLEN 7499 7698
3 JAMES 7900 7698
3 MARTIN 7654 7698
3 TURNER 7844 7698
3 WARD 7521 7698
2 CLARK 7782 7839
3 MILLER 7934 7782
2 JONES 7566 7839
3 FORD 7902 7566
4 SMITH 7369 7902
3 SCOTT 7788 7566
4 ADAMS 7876 7788
14 rows selected.
SQL>
Listing 4 uses the Connect_by_root operator and the Connect_by_isleaf pseudo-column. Connect_by_root identifies a value from the root row. In the case of Listing 4, the row is KING’s and the retained expression is Ename with a value of KING. Notice that the Connect_by_root operator displays KING on each row since it is the root value for all hierarchies. The Connect_by_isleaf pseudo-column is used in the fourth expression. It displays a value of 0 if the current row has children and a value of 1 if the row does not have children.
Listing 4 – Using the Connect_by_root operator and the Connect_by_isleaf pseudo-column
SQL> Select level, lpad(' ',2*(level - 1)) || ename "EMPLOYEE",
2 Connect_by_root ename "Head Person",
3 Connect_by_isleaf "End of Hier"
4 from emp
5 start with mgr is null
6 connect by prior empno = mgr
7* order siblings by ename
LEVEL EMPLOYEE Head Perso End of Hier
---------- ------------------------- ---------- -----------
1 KING KING 0
2 BLAKE KING 0
3 ALLEN KING 1
3 JAMES KING 1
3 MARTIN KING 1
3 TURNER KING 1
3 WARD KING 1
2 CLARK KING 0
3 MILLER KING 1
2 JONES KING 0
3 FORD KING 0
4 SMITH KING 1
3 SCOTT KING 0
4 ADAMS KING 1
14 rows selected.
SQL>
Hierarchical data sets generally do not have cycles. A cycle is a record that has an ancestor record that is also a child. However, this condition may occur as the result of an error or for some other unknown reason. If Oracle encounters a cycle, it issues an error message. In order to display the error message Oracle has created the Nocycle keyword. Oracle has also created the Connect_by_iscycle pseudo-column. This column displays a value of 0 if the current record has children records that are not parents. It displays a value of 1 if the record has a child record that is also a parent record. In essence, the returning a value of 1 indicates a cycle existing.
The default Oracle EMP table does not have a cycle, so it is necessary to modify the table creating a cycle. Employee KING is the root record. KING has a null MGR value since KING does not report to anyone else. If KING’s MGR value is modified to 7900, a cycle is. 7900 is JAMES’ Empno value and KING will now report to JAMES. Listing 5 contains a Select statement that is executed against the EMP table that contains a cycle. Notice the error message.
Listing 5 – Hierarchical query against a table containing a cycle
SQL> Select level, lpad(' ',2*(level - 1)) || ename "EMPLOYEE",
2 empno, mgr
3 from emp
4 start with empno = 7839
5 connect by prior empno = mgr
6 order siblings by ename
7 ;
ERROR:
ORA-01436: CONNECT BY loop in user data
no rows selected
SQL>
Listing 6 is the same Select statement except the Nocycle keyword is added to the Connect By section. In addition, the Select clause now contains the Connect_by_iscycle pseudo-column. The Select statement now executes successfully. JAMES’ record is identified by the Connect_by_iscycle column at the beginning of a cycle. This is determined by the value of 1 for JAMES. The Nocycle option terminates the cycle at that point and continues the query.
Listing 6 – Using the Connect_by_iscycle pseudo-column and Nocycle keyword to identify a cycle and eliminate the error message
SQL> Select level, lpad(' ',2*(level - 1)) || ename "EMPLOYEE",
2 empno, mgr, connect_by_iscycle
3 from emp
4 start with empno = 7839
5 connect by nocycle prior empno = mgr
6* order siblings by ename
LEVEL EMPLOYEE EMPNO MGR CONNECT_BY_ISCYCLE
---------- ------------------- ---------- ---------- ------------------
1 KING 7839 7900 0
2 BLAKE 7698 7839 0
3 ALLEN 7499 7698 0
3 JAMES 7900 7698 1
3 MARTIN 7654 7698 0
3 TURNER 7844 7698 0
3 WARD 7521 7698 0
2 CLARK 7782 7839 0
3 MILLER 7934 7782 0
2 JONES 7566 7839 0
3 FORD 7902 7566 0
4 SMITH 7369 7902 0
3 SCOTT 7788 7566 0
4 ADAMS 7876 7788 0
I believe that the new Oracle10g enhancements will make the hierarchical a much more powerful tool for you.
John Adolph Palinski is a former adjunct faculty member at the University of Nebraska at Omaha and Iowa Western Community College. He is the author of the "Oracle Database Construction Kit", Que, 1997, the "Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, Developers, and Business Analysts", Addison Wesley Longman, 2002, and "Oracle 9i Developer: Developing Web Applications With Forms Builder", Course Technology, 2003. He has also written numerous articles for periodicals such as Oracle Internals and Oracle Professional. He provides Oracle training and consulting services. One of his passions is to help customers set up logical business models to improve information efficiency and effectiveness. He may be contacted at www.oracle-trainer.com.