DB2 and the Procedural DBA, part 1
Craig S Mullins, Director, DB2 Technology Planning, BMC Software (USA)
In 1995 I coined the term Procedural DBA to describe a new type of database administration required by modern database management. The concept is a simple one - a new type of DBA is required to manage the procedural logic that is increasingly being stored in relational database management systems.
Initially, the sole purpose of a DBMS was to store, manage, and access data. Over time, database management systems have evolved by integrating procedural logic in the form of complex triggers, stored procedures, and user-defined functions. This code is tightly coupled to the DBMS. As these features are exploited, management tasks such as administration, design, and tuning, are typically assigned to the current DBA staff by default. But this is not always the best approach. What is required is an expansion of the role of database administration.
THE CLASSIC ROLE OF THE DBA
When a DBMS is implemented appropriately, its use spans the enterprise. Multiple applications, consisting of multiple programs, access and manipulate data stored in databases that are managed by the DBMS. A scenario such as this is good because it reduces data redundancy and increases data integrity. However, this same situation also effectively places the DBA on call for all the applications of the organization. If the database portion of any application fails, the DBA must be able to fix the problem, bringing the database back on-line so the application can execute.
To make matters more difficult, the role of the DBA has expanded over the years. In pre-relational days, both database design and data access was complex. Programmers were required to explicitly code program logic to navigate through the database structure to access data. Usually the pre-relational DBA was assigned the task of designing the hierarchic or network database design. Almost always, this process consisted of both logical and physical database design, although it was not always recognized as such at the time. Once the database was planned, designed, and generated, and the DBA created back-up and recovery jobs, little more than space management and reorganizations were required. Of course, this sounds easier than it actually was. Pre-relational DBMS products (such as IMS) require a complex series of utility programs to be run in order to perform back-up, recovery, and reorganization, consuming a large amount of time and effort.
Today, of course, DBAs still design databases and perform tasks such as back-up, recovery, and reorganization. But increasingly databases are generated from logical data models created by data administration staff using data modelling and database design tools. Additionally, the utilities for performing back-up, recovery, and reorganization are simpler to build in the relational world.
Although the up-front effort required to design a relational database is reduced, it is not eliminated. Relational design still requires physical implementation decisions such as table design, partitioning, indexing, normalization, and denormalization. But instead of just performing physical implementation and administration, DBAs are more intimately involved with procedural data access too. The nature of relational technology requires additional involvement during the design of data access routines. This is true because relational optimizer technology embedded in the RDBMS is used to choose the best access paths to the data. The optimization choices must be reviewed by the DBA. Therefore, application program and SQL design reviews are a vital component of the DBA's job.
Furthermore, DBAs perform most monitoring and tuning responsibilities. DBAs use tools like EXPLAIN, performance monitors, and SQL analysis tools to proactively administer RDBMS applications.
Often, DBAs are not adequately trained in these areas. It is a distinctly different skill to program than it is to create well-designed relational databases. Yet, DBAs quickly learn that they have to be able to understand application programming techniques to succeed.
DBMS-COUPLED APPLICATION LOGIC
Although DB2 was one of the last major RDBMS products to gain a full complement of tools for storing procedural logic in the database, its current support as of Version 6 is very robust. DB2 provides support for stored procedures, triggers, user-defined functions, a procedural version of SQL based on SQL/PSM, and user-defined data types.
A procedural SQL language adds features such as looping, branching, and flow of control statements to make SQL a more functionally complete and useful programming language. Using DB2's version of SQL/PSM, developers can create complex functional stored procedures and triggers without the need to code a 3GL program.
Let's define the different types of logic that can be stored, accessed, and managed in DB2 databases.
Stored procedures are procedural logic that is maintained, administered, and executed through the RDBMS. The primary reason for using stored procedures is to move application code off the client and on to the database server. This can result in reduced overhead because one client can invoke a stored procedure consisting of multiple SQL statements. Invoking one procedure to execute multiple SQL statements is preferable to the client executing multiple SQL statements directly because it minimizes network traffic, thereby enhancing overall application performance. A stored procedure is not 'physically' associated with any other object in the database. It can access and/or modify data in one or more tables. Basically, stored procedures can be thought of as 'programs' that 'live' in the RDBMS.
Triggers are event-driven specialized procedures that are stored in, and executed by, the RDBMS. Each trigger is attached to a single, specified table. Triggers can be thought of as an advanced form of 'rule' or 'constraint' written using procedural logic. A trigger cannot be directly called or executed; it is automatically executed (or 'fired') by the RDBMS as the result of an action - usually a data modification to the associated table. Once a trigger is created, it is always executed when its 'firing' event occurs (update, insert, delete, etc).
User-defined functions (UDFs) provide developers with the ability to extend the SQL language. Once coded, a UDF can be specified wherever a built-in SQL function can be specified. In general, DB2 functions (both built-in and user-defined) can be used anywhere an expression can be used (with some exceptions). Functions are called by specifying the function name and any required operands.
You can think of stored procedures and triggers and user-defined functions in the same way as other database objects such as tables, views, and indexes, because they are controlled by and managed within DB2. These objects are often collectively referred to as server code objects (SCOs), because they are actually program code that is managed by a database server as a database object.
WHY USE SERVER CODE LOGIC?
The predominant reason for using SCOs is to promote code reusability. Instead of replicating code on multiple servers or within multiple application programs, SCOs enable code to reside in a single place - the database server. SCOs can be automatically executed, based on context and activity, or can be called from multiple client programs as required. This is preferable to cannibalizing sections of program code for each new application that must be developed. SCOs enable logic to be invoked from multiple processes instead of being re-coded into each new process every time the code is required.
An additional benefit of SCOs is increased consistency. If every user and every database activity (with the same requirements) is assured of using the SCO instead of multiple replicated code segments, then the organization can be assured that everyone is running the same, consistent code. If each individual user deployed his or her own individual and separate code, no assurance could be given that the same business logic was being used by everyone. In fact, it is almost a certainty that inconsistencies would occur.
Additionally, SCOs are useful for reducing the overall code maintenance effort. Because SCOs exist in a single place (the RDBMS), changes can be made quickly without requiring propagation of the change to multiple workstations.
Finally, SCOs can be coded to support database integrity constraints, implement security requirements, reduce code maintenance efforts, support remote data access, and, as mentioned earlier, enhance performance. Of course, in order to achieve these gains, SCOs need to be effectively managed and administered. Hence the need for a Procedural DBA.
To be concluded next week.