ARTICLES

Home  > Articles  >  Using the Insert All statement to easily create multiple summary tables
Using the Insert All statement to easily create multiple summary tables

By John Adolph Palinski

Data warehouses based upon star schemas often need summary tables that aggregate facts to a higher level grain.  The summary tables are used to enhance query performance by avoiding the need to aggregate the fact table at runtime.  This article discusses how to create multiple summary tables with a single pass of the fact data using the Insert All statement.

A key data warehouse performance enhancement tool are summary tables.  Summary tables allow the user to avoid having to aggregate massive amounts of records at runtime.  A typical well designed data warehouse models data in star schemas.  A star schema consists of a series of dimension tables that are linked to a common fact table.  The fact table contains foreign keys that reference the various dimension tables that are the points of the star.  The center part of the star is the fact table. 

A fact table is similar to an associate entity. Fact tables have foreign keys to each of the related dimension tables along with a measurement value such as a transaction cost.  A dimension table is really a series of attributes that describe a particular object such as a project, customer, county, or state.  Typically, the fact table contains many records with a small number of attributes.  On the other hand, the dimension table contains many attributes but a smaller number of records.  The beauty of a star schema is the ability to slice-and-dice the measurement facts into virtually endless series of combinations. 

However, many of the star schema slices-and-dices consist of joining the dimensions through the fact table and aggregating rows.  There are tools such as star transformations and special indexing that can make this slicing-and-dicing efficient, but each time a user slices-and-dices, aggregation often occurs.  Depending on the extent of aggregation, a query against even the best tuned star can cause the user to wait for the results.  For this reason, developers often create summary tables of common aggregations.  The aggregation occurs off hours after the star schema fact table is refreshed.  Using projects, customers, counties, states, and costs as the dimension and facts, typical summary tables may consist of the following aggregations: Total Project costs for each county, Total project costs for each state, Total project costs for each customer, or Total project costs for each customer in a county. 

Summary tables of the previous facts allow the user to search a smaller table at runtime.  This enhances performance and the user experience.  Unless the user is interested in esoteric analysis, it is a good idea to avoid massive aggregation at runtime.  Busy management personnel are not likely to wait 10 minutes more than once for a query to return.  In order to keep users returning to the star, it is sometimes best if the user accesses aggregated or summary tables that contain smaller sets of records and drill into the fact table for specific information as much as possible.  This will enhance the user experience.

The summary tables are created off-hours.  In order to maintain batch windows, it is best to create the summary tables efficiently.  The Insert All and Grouping clause options offer a tool to create multiple summary tables with a single pass of the fact table.  This eliminates the need to requery the table for each aggregation, saving precious time. 

To illustrate the Insert All and Grouping Set functionality, assume that the data warehouse has a cost star schema containing a project, customer, county, and state dimension.  At the center of the star is a fact table of cost transactions.  Each fact record has a foreign key to each of the dimensions.  Here is the basic schema.  Note that only a select few dimension attributes are listed.

 Project  Customer  County  State  Costs
 Project_key  Customer_key  County_key  State_key  Fk_project_key
 Project_name  Customer_name  County_name  State_name  Fk_customer_key
         Fk_county_key
         Fk_state_key
         Cost

Continuing the example, a developer has been charged to create the following summary tables:

Project costs by Customer
Project costs by Customer, State
Project costs by Customer, County

Typically the developer would need to create three different Insert/Select statements to perform the aggregation and create the summary table.  The following are the statements:

Insert into customer_costs
  (fk_customer_key, customer_cost)
select fk_customer_key, sum(cost)
from costs
group by fk_customer_key;

Insert into customer_state_costs
  (fk_customer_key, fk_state_key, customer_state_cost)
select fk_customer_key, fk_state_key, sum(cost)
from costs
group by fk_customer_key, fk_state_key;

Insert into customer_count_costs
  (fk_customer_key, fk_county, customer_county_cost)
select fk_customer_key, fk_county, sum(cost)
from costs
group by fk_customer_key, fk_county;

The above statements require three different aggregations and more importantly require three separate passes of the source data, the Costs table.  Depending on the size of the Costs table this can be a lengthly time period.  The Insert All and new Grouping Set options allow you to do the same aggregations with a single pass of the data.  The below Insert All statement using the Grouping Set options perform the same functionality as the previous three Insert statements.  The Insert All statement is somewhat similar to the Insert statement in that it has an Into and a Values clause.  The difference is the Insert All statement can have multiple Into and Values clauses.  Additional functionality includes a When condition which can precede each set of clauses limiting the tables refreshed. 

The following statement contains three sets of Into/Values clauses.  Each set adds records to a different summary table.  When condition statements precede each Into/Values clause.  The When condition determines the table to insert records.  You might also review the Select clause.  It has a Grouping Sets option that causes different aggregations to be computed with a single pass of the Costs table.  Grouping Sets and other Group By options are more fully discussed in a previous SSWUG article (Oracle's Enhanced Group By Clause Part 2, July 22, 2004).

Insert all
  when (fk_state_key is null
             and fk_county_key is null)
      into customer_costs
      values (fk_customer_key, cost)
  when (fk_state_key is not null
             and fk_county_key is null)
      into customer_state_costs
      values (fk_customer_key, fk_state_key, cost)
  when (fk_state_key is null
             and fk_county_key is not null)
      into customer_county_costs
      values (fk_customer_key, fk_county_key, cost)
select fk_customer_key, fk_state, fk_county, sum(cost) cost
from costs
group by grouping sets
               ((fk_customer_key),
                (fk_customer_key, fk_state_key),
                (fk_customer_key, fk_county_key))

The following is a syntax template of the Insert All statement:

   Insert [all|first]
      when (condition)                      
-optional clause
         into table_name (target values)

         values (select clause expressions)
      when (
condition)   
          ...
      
else                                            
-optional clause
         into table_name (target values)
         values (select clause expressions)
      Select statement

The statement has a number of features.  These are:

 .  All|First The All option evaluates each clause.  If the condition is met the action will occur.  The First option causes evaluation to stop when the first positive condition is met.
 .  When  This is an optional condition clause.  If the clause is omitted the action will be performed.
 Else  This is an optional clause that is executed if none of the When conditions are not met.
 .  Select  The source of the Insert All statement must be a Select statement

I believe that you will find the Insert All statement a valuable tool for efficiently creating summary tables.

John Adolph Palinski is a former adjunct faculty member at the University of Nebraska at Omaha and Iowa Western Community College. He is the author of the "Oracle Database Construction Kit", Que, 1997, the "Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, Developers, and Business Analysts", Addison Wesley Longman, 2002, and "Oracle 9i Developer: Developing Web Applications With Forms Builder", Course Technology, 2003. He has also written numerous articles for periodicals such as Oracle Internals and Oracle Professional. He provides Oracle training and consulting services. One of his passions is to help customers set up logical business models to improve information efficiency and effectiveness. He may be contacted at www.oracle-trainer.com.