Oracle's Enhanced Group By Clause Part 1
By John Adolph Palinski
Recent database versions enhanced Oracle's Group By clause. The Group By clause allows the developer to perform a large variety of aggregations with a single pass of the target data. These enhancements increase the performance of deriving data warehouse facts by reducing the number of data passes. This is the first of two articles that explore these enhancements. In this article you will see the Rollup and Cube list options along with the Grouping() function. The next article covers Grouping Sets, Concatenated Groupings, and Composite Columns.
Historically, Oracle has had a Group By clause available in the Select statement. The Group By clause uses Oracle Grouping functions (i.e. sum, avg, variance, min, max, count) and determines the granularity of an aggregation. Select statements that use the Group By clause create rows that have the same granularity. For example, a Select statement using the below Group By clause creates rows aggregated to the Sales_area, Gender, and State grain.
Group by sales_area, gender, state
With the advent of data warehouses and dimensional models, it has become necessary to create databases of aggregated or derivative facts. The derivative facts are contained in fact tables linked to dimension tables. Fact tables contain numeric values such as commissions or costs and foreign keys to the linked dimension table(s). The fact records relate to dimension table records to produce business information for analysis. Examples of derivative facts are:
- Commissions by Sales_area
- Commissions by Sales_area, Gender
- Commissions by Sales_area, State
- Commissions by Sales_area, Gender, State
- Commissions by Gender
- Commissions by Gender, State
- Commissions by State
Using the historical Group By clause to create the various aggregations requires seven Select statements and seven separate passes through the source data. Reducing the number of source data passes and the resulting performance enhancement is the reason for the Group By clause enhancements.
Rollup
The Rollup list option is an extension of the Group By clause. Rollup allows the developer to aggregate values based upon the Group By clause expressions and to create additional, coarser levels of aggregation with the same data pass. The additional aggregations result from expressions in the Rollup list. The following is a template of the Group By clause and Rollup list option.
Group by rollup (expressions), expressions
Listing 1 depicts a typical aggregation based upon the Group By clause. The Select statement sums commissions by Sales_area, Gender, and State. The rows in this result set are typical of the historic aggregation using the Group By clause. All rows in the result set have the same granularity.
Listing 1 - Using the Group By clause to sum commissions by Sales_area, Gender, and State
SQL> Select sales_area, gender, state, sum(commission)
2 From salesmen
3 Natural join salesman_commissions
4* Group by sales_area, gender, state
S G ST SUM(COMMISSION)
- - -- ---------------
E M MA 18800
E M NJ 1450
E M NY 3750
N F MI 750
...
W M AR 3200
W M CA 250
W M OR 1100
W M WA 250
16 rows selected.
SQL>
In Listing 2, the Group By columns Sales_area, Gender, and State are placed inside a Rollup list. This causes extra aggregations to occur with coarser levels of granularity than Sales_area, Gender, and State. The Select statement now computes the following:
- Sum of commissions by Sales_area, Gender, and State (same as in Listing 1)
- Sum of commissions by Sales_area and Gender
- Sum of commissions by Sales_area
- Sum of commissions
Listing 2 - Using the Rollup list to add commission aggregations for Sales_area/Gender, Sales_area, and the result set
SQL> Select sales_area, gender, state, sum(commission)
2 From salesmen
3 Natural join salesman_commissions
4* Group by rollup(sales_area, gender, state)
S G ST SUM(COMMISSION)
- - -- ---------------
E M MA 18800
E M NJ 1450
E M NY 3750
E M 24000
E 24000
N F MI 750
N F WI 4300
N F 5050
...
W M 4800
W 5050
60300
27 rows selected.
SQL>
The Rollup list creates an aggregation based upon all of the values in the list. If the Rollup list contains three expressions such as the Listing 2 example, Oracle performs three additional aggregations. One aggregation is based upon the all columns in the Group By clause. This is the finest aggregation grain. The additional aggregations consist of aggregation of combined expressions from right to left (Sales_area/Gender, Sales_area, and Result set).
If the order of the expressions is changed, the number of aggregations remains the same. However, the aggregation types change. Listing 3 contains the same basic Select statement. However, the expressions in the Rollup list are rotated. Sales_area moves from the first expression to the third expression. State moves from the third expression to the first. Notice that the same low-level aggregation occurs. This aggregations sums commissions by Sales_area, Gender, and State. However, the State/Gender and State aggregations replace the Sales_area/Gender and Sales_area aggregations.
Listing 3 - Using the Rollup list to subtotal commissions by State/Gender, by State, and entire result set
SQL> Select sales_area, gender, state, sum(commission)
2 From salesmen
3 Natural join salesman_commissions
4* Group by rollup(state, gender, sales_area)
S G ST SUM(COMMISSION)
- - -- ---------------
S M AL 4000
M AL 4000
AL 4000
W M AR 3200
M AR 3200
AR 3200
W F CA 250
F CA 250
W M CA 250
M CA 250
CA 500
...
W M WA 250
M WA 250
WA 250
N F WI 4300
F WI 4300
WI 4300
60300
47 rows selected.
SQL>
It is also possible to place some of the Group By expressions outside the Rollup list. Placing an expression(s) outside Rollup list change the granularity of the extra aggregations. The aggregation will be based upon the outside expression(s). Notice Listing 4, the State expression was placed outside the Rollup list. Each of the aggregations is now based upon a State value. The coarsest aggregation is the commissions for a particular state. Compare the results to those in Listing 2, which include all the expressions in the Rollup list. One of the first things you should notice is that Listing 2 has an entire result set aggregation while Listing 4 does not. Since all expressions were included in the Listing 2 Rollup list, the base aggregation is the entire result set. The Listing 4 Select statement cannot aggregate broader than the commissions per state. Reviewing Listing 4 you can see that the below aggregations occur. The only one that is the same is the coarsest aggregation: Sales_area, Gender, and State.
- Sum of commissions by State, Sales_area, and Gender
- Sum of commissions by State and Sales_area
- Sum of commissions by State
Listing 4 - Placing the State expression outside the Rollup list
SQL> Select sales_area, gender, state, sum(commission)
2 From salesmen
3 Natural join salesman_commissions
4* Group by rollup(sales_area, gender), state
S G ST SUM(COMMISSION)
- - -- ---------------
S M AL 4000
S AL 4000
AL 4000
W M AR 3200
W AR 3200
AR 3200
...
N M SD 4250
N SD 4250
SD 4250
S M TX 6500
S TX 6500
TX 6500
W M WA 250
W WA 250
WA 250
N F WI 4300
N WI 4300
WI 4300
44 rows selected.
SQL>
Cube
The Rollup list seen in the previous section only performs a specific set of aggregations. The Rollup list decreases aggregation granularity from right-to-left. For example, in Listing 2 the Rollup list contained three expressions: Sales_area, Gender, and State. Four different aggregations occur. Other potential aggregations such as commissions by Sales_area/State, Gender/State, and Gender were omitted. If you would like to have all available possible granularity combinations use the Cube list. This is done in Listing 5, which is a modified version of Listing 2. The Rollup list option is replaced by the Cube list option. Notice the increased number of aggregations.
Listing 5 - Using the Cube list to produce all aggregation combinations of Sales_area, Gender, and State
SQL> Select sales_area, gender, state, sum(commission)
2 From salesmen
3 Natural join salesman_commissions
4 Group by cube(sales_area, gender, state)
5*
S G ST SUM(COMMISSION)
- - -- ---------------
60300
AL 4000
AR 3200
CA 500
FL 250
...
TX 6500
WA 250
WI 4300
F 5300
F CA 250
F MI 750
F WI 4300
M 55000
...
M SD 4250
M TX 6500
M WA 250
E 24000
E MA 18800
...
N F WI 4300
N M 15450
N M MI 2550
N M ND 8650
N M SD 4250
S 10750
S AL 4000
S FL 250
S TX 6500
...
W M 4800
W M AR 3200
W M CA 250
W M OR 1100
W M WA 250
73 rows selected.
SQL>
The Grouping() function
The Rollup and Cube functions are used as data warehourse extraction, transformation, and loading tools. The purpose of the lists is to derive many different aggregations with a single data source pass. The derived data likely exists in different data warehouse tables since the rows identify different levels of aggregation. Rollup and Cube are often used in PL/SQL cursors or the Insert..All statement where conditional logic is used to identify the table receiving the rows. Since the rows go into different tables, it is important to identify the granularity of the aggregation in order to determine the table to place the derived fact. This is where the Grouping() function is used. The Grouping() function returns a value of 1 if the derived value is the coarsest grained value derivative. The function returns a value of 0 if the row is not the coarsest grained value. Knowledge about the row's granularity allows the developer to add the result set row to the proper table. If the record sums commission to the Sales_area, the row is sent to the Sales_area_commissions table. If the record sums commissions to the Gender, the row is sent to the Gender_commissions table. The granularity knowledge makes the Rollup and Cube lists effective ETL tools.
The function contains a single parameter. The parameter is one of the columns identified in the Rollup or Cube list. The following is a template of the function.
Grouping(column_name)
Listing 6 illustrates the Grouping function. The Select statement has a Rollup list that aggregates the Gender values. Notice that the granular aggregations of commissions by Sales_area/Gender/State and Sales_area/Gender receive a value of 0 from the Grouping function. Gender aggregations by Gender and the result set total receive a value of 1.
Listing 9.6 - Using the Grouping() function to identify high granular Gender aggregations
SQL> Select sales_area, gender, state, sum(commission),
2 grouping(gender)
3 From salesmen
4 Natural join salesman_commissions
5 Group by rollup(sales_area, gender, state)
6*
S G ST SUM(COMMISSION) GROUPING(GENDER)
- - -- --------------- ----------------
E M MA 18800 0
E M NJ 1450 0
E M NY 3750 0
E M 24000 0
E 24000 1
...
S M 10750 0
S 10750 1
W F CA 250 0
W F 250 0
W M AR 3200 0
W M CA 250 0
W M OR 1100 0
W M WA 250 0
W M 4800 0
W 5050 1
60300 1
27 rows selected.
SQL>
As you can see the Rollup and Cube lists allow you to easily create additional aggregations using the same data pass. In part 2 of this article, you will see how to fine tune the number of aggregations using Grouping Sets, Concatenated Group Sets, and Composite Columns.
John Adolph Palinski is a former adjunct faculty member at the University of Nebraska at Omaha and Iowa Western Community College. He is the author of the "Oracle Database Construction Kit", Que, 1997, the "Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, Developers, and Business Analysts", Addison Wesley Longman, 2002, and "Oracle 9i Developer: Developing Web Applications With Forms Builder", Course Technology, 2003. He has also written numerous articles for periodicals such as Oracle Internals and Oracle Professional. He provides Oracle training and consulting services. One of his passions is to help customers set up logical business models to improve information efficiency and effectiveness. He may be contacted at www.oracle-trainer.com.