Oracle's Merge Statement as an ETL Tool
By John Adolph Palinski
Oracle added an extremely useful tool to its 9i+ database versions. This tool is the Merge statement. The statement allows the developer to Insert and Update a target table using the same result set without having to use a Where clause statement to locate the Updated record. This tool greatly simplifies the refresh of data warehouse tables. The statement is also much simpler and enhances performance.
Prior to this statement, refreshes that required Inserts and Updates require a PL/SQL code block with various cursors, loops, and If-Then structures. Listing 1 depicts a typical PL/SQL code block that refreshes a table. The script has the following components:
- A For Loop that retrieves the fresh data into the PL/SQL code block
- A Cursor opened for each row fectched by the For Loop. This cursor tests the target data warehouse table for record existence
- Cursor Open, Fetch, and Close statements.
- An If-Then-Else construct that tests the success of the cursor fetch
- An Update statement
- An Insert statement
The result of the script is that records that exist in the data warehouse table are updated and the records that do not exist are inserted.
Listing 1 - PL/SQL code block that refreshes a data warehouse table with Inserts and updates
begin
for new_data in (select * from emp) -- Brings in the current records
loop
declare
cursor test is select *
from emp_data_warehouse
where emp_data_warehouse.empno = new_data_empno;
test_var test@rowtype;
begin
open test; -- Executes a Select statement against the target data warehouse
fetch test into test_var;
if test%found then - Determines whether the new record exists in the data warehouse
update emp_data_warehouse -- Update the record if it exists
set emp_data_warehouse = new_data
where emp_data_warehouse.empno = new_data.empno;
else
insert into emp_data_warehouse -- Insert the record it it doesn't exist
(emp_data_warehouse)
values
(new_data);
end if;
close test;
end;
end loop;
end;
/
The Merge statement simplifies the code needed to perform the insert/update. The statement increases performance by:
- Eliminating the need to open a cursor for each row checking for existence
- Having to locate and lock each updated record
The Merge statement eliminates the tasks since its first step is to join the new data set records with the target data warehouse data set records. Oracle uses the power of a join to identify whether a record exists or not. The Merge statement allows the developer to perform different actions when a match or non-match occurs. Joining and matching result sets eliminates much of the incremental database hits caused by opening the cursor and issuing the update statement.
Listing 2 illustrates a merge statement that performs the same insert/update as the Listing 1 PL/SQL code block.
Listing 2 - Using the Merge statement to refresh a data warehouse table
merge into emp_data_warehouse -- Identifies the target table
using emp -- Identifies the source table, view, or in-line view
on (emp_data_warehouse.empno = emp.empno) -- Join condition
when matched then -- Reviews the joined result set and tests for a match
update -- Notice the Update statement doesn't have a data source name
set emp_data_warehouse.ename = emp.ename -- No Where clause is needed
when not matched then -- Review the joined result set and tests for a match
insert
(emp_data_warehouse.empno, emp_data_warehouse.ename)
values
(emp.empno, emp.ename)
Notice that there are far fewer lines in the Merge statement. It is also much simpler. The following is a template of the statement:
merge into refreshed_data_source
using (refreshed_from_data_source)
on (matching conditions)
when matched then
statement
when not matched then
statement
Some points to consider are:
| - |
The Merge statement is not a PL/SQL code block. Do not end the Insert or Update statement with a semi-colon |
| - |
The matching conditions clause can only contain join conditions. Filtering clauses will terminate with internal error codes. If the refreshed_from_data_source is to be filtered, create and in-line view as the data source and add the filtering condition to the view |
| - |
The statement can have a When Matched clause without having a When Not Matched clause. The statement can also be valid with a When Not Matched clause without a When Matched clause |
| - |
The Insert statement canly only placed in the When Not Matched clause |
| - |
The Update statement can only be placed in the When Matched clause |
| - |
Do not update any join column arguments in the matching conditions clause |
The Merge statement is a very effective tool. However, the 9i version issues incorrect error messages. For example, if you should try to update one of the matching clause join columns, Oracle issues an "Invalid Identifier" error. The error does not correspond to the actual error. Oracle corrected the problems in the 10g version. Remember the above points and being aware that the issued error message may not match the actual syntax problem will help you be successful with the Merge statement.
John Adolph Palinski is a former adjunct faculty member at the University of Nebraska at Omaha and Iowa Western Community College. He is the author of the "Oracle Database Construction Kit", Que, 1997, the "Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, Developers, and Business Analysts", Addison Wesley Longman, 2002, and "Oracle 9i Developer: Developing Web Applications With Forms Builder", Course Technology, 2003. He has also written numerous articles for periodicals such as Oracle Internals and Oracle Professional. He provides Oracle training and consulting services. One of his passions is to help customers set up logical business models to improve information efficiency and effectiveness. He may be contacted at www.oracle-trainer.com.