Editorials

Improving Performance With Partitions

We’ve been talking about optimizing a database when it has a lot of data, at least from an indirect standpoint. What starting me on the topic was the use of an INSTAEAD OF trigger for something other than improving performance. INSTEAD OF triggers were designed to allow a DBA to spread data from a single large table across multiple tables through the use of a view. An INSTEAD OF trigger was then used to handle Insert, Update and Deletes to target the appropriate physical table.

That leads into the newer technology replacing Federated Views and Instead of Triggers with the introduction of Partitioned tables. Partitioned tables behave much the same as having rolled your own Federated View, but remove all the heavy lifting. All you have to do is create a partitioning function, and then add tables to the Partitioned table implementing a partition from the function. This is much simpler.

Some creative people found they could increase database performance at a very large scale by creating multiple databases and implementing their own partitioning director, referred to as sharding a database. This sharding technology is outside of the database, allowing many different database engines to be implemented. A data broker is developed which routes requests to one or more separate database instances and then brings the independent results together into a single response for the database client. The client is unaware that more than one database is handling their request.

Microsoft built a similar technology in SQL Azure with their Federated View. Again, this capability uses multiple databases as in the sharding scenario above. SQL Azure provides the data broker engine on your behalf. You may have to do some work combining multiple result sets for you query, but the heavy lifting is done for you.

For massive scale processing, Microsoft released the Database Appliance, which is a combination of hardware and software, enabling sharding in an optimized fashion. It combines both the sharding of database objects and the replication of database objects. In some instances, data is replicated. This is common for lookup tables or dimension tables. Then, tables with large numbers of rows are often sharded, placed on multiple instances of SQL Server, allowing queries to be simultaneously served. Each instance has its own copy of lookup tables, so that data is local to all queries and not brought from a linked database. Everything is local and performance in insanely fast.

Again Microsoft provides the data broker capabilities so that the client connects to the appliance, not a specific database. This hides the organization of the data from the client, and allows modifications without impacting your application connectivity.

So, when you hear about NoSql increasing performance by sharding data, you now know that sharding is not relegated to data persistence in a NoSql data storage engine. However, due to the relational nature of a SQL database, the implications of sharding are more complex.

So there you have a summary of data partitioning methods currently found in relational databases. There are many tools for doing sharding other than write your own, or those you purchase from Microsoft. But they all work in much the same way. You can buy a data broker written specifically for MySql for example. If performance is a problem due to a database with tables having billions or trillions of rows, perhaps you should look into sharding.

Cheers,

Ben