SQL Server

Partitioning relational and multidimensional data for OLAP solution (Part 1)

Partitioning relational and multidimensional data for OLAP solution – (Part – 1)

The design of the relational database and all multidimensional structures will help to determine the performance of your overall solution. First introduced in SQL Server 2005, SQL Server partitioned tables features allow you to partition data in a table across multiple filegroups. Most often, the partitioning is used to logically break the fact tables for easier management and faster performance. Some very large dimension tables can also benefit from the partitioning. An example is the employee with a primary surrogate key dimension defined with the IDENTITY property. The newest employees are always in the same partition.

In this first of the two part article series, we will talk about partitioning relational data.

Reasons to partition relational data

The following are the two common reasons to partition relational data:

Manageability

Manageability can be improved by partitioning relational data. For example, when you are using table partitioning, the new data can be added to new and existing table partitions with minimal impact on other partitions. Partitioning strategies improves and simplifies data archiving and aging by merging, splitting and moving partitions. Moreover, it also simplifies database backup and restore operations especially when read_only filegroups are used.

Performance

Partitioning your relational data has a positive impact on your solution in several ways. For example, data archiving process becomes faster by partitioning relational data. That’s because, all the data to be archived is usually in a single partition. Partitioning also improves query execution time because SQL Server treats individual partition as a separate entity. Moreover, partitioning a large table limits the amount of data SQL Server must load into memory or scan during queries. Besides this, the partition gives privilege to bulk load data outside the table structure. For example, you have load the data into staging partition table, and once the data is loaded you can load the data to main partition table by just switching the partitions.

Steps to partition tables

The following are the steps to partition tables:

Creating Partition File Groups

The first step to partition a table is to create the filegroups that will store the partitions. We will use these filegroups to store partitions data. For more information about how to create filegroups and files within existing database, see Understanding Files and Filegroups.

Partition functions

The next step is to create a partition function. A partition function tells SQL Server the data type of the column to use as the basis for partitioning the data, number of partitions to create, and the boundaries for each partition. Partition functions are reusable because of the fact that they only define partitions. For example, you can use the same partition function to partition many tables or indexes using same ranges. You create a partition function by using the CREATE PARTITION FUNCTION statement. The statement has the following syntax:

The parameters for the function are described in the following table.

Parameter

Description

partition_function_name

Name of the partition function

input_parameter_type

Data type of the column used to partition data. You identify the partitioning column in the CREATE TABLE statement.

boundary_value

List of values that determine the values stored in each partition.

You use the RANGE to determine if you include the boundary value of the partition to the left or right side of the boundary.

You can query the sys.partition_functions catalog view for a list of partition functions. You can query the sys.partition_range_values catalog view to obtain the boundary values.

Partition scheme

The next step is to create partition scheme. Partition scheme tells SQL Server how to distribute the partitions of a partitioned table or index among the physical filegroups. To create a partition scheme, we use CREATE PARTITION SCHEME statement. The statement has the following syntax:

The following table describes the arguments of CREATE PARTITION SCHEME syntax:

Argument

Description

partition_scheme_name

Specifies the name of the partition scheme. The partition scheme name must be unique within the database.

partition_function_name

Specifies the name of the partition function using the partition scheme. The partition scheme name must be unique within the database. Partition function must be created before creating partition scheme.

file_group_name | PRIMARY

In here you specify the list of filegroups that store the partitions. For example, you should list three filegroups if your partition function specifies three partitions. You can use the PRIMARY keyword to use the primary filegroup.

ALL

When specified, all partitions from a partition function are mapped into a same filegroup.

Create the partition tables and indexes, defining the partitioning column

Once you created the partition function and partition scheme, you can begin using them to create partition function and tables. You can do this by specifying partition scheme next to ON clause of CREATE TABLE and CREATE INDEX statement. For example, the following creates the EmployeeDetails table on partition scheme:

Once you created the table, you can then use sys.partitions catalog view to view a list of partitions created within the database. NONPARTITIONED tables have one entry in the view.

Guidelines for Partitioning Relational Data

When you partition your relational data to support a multidimensional data solution, there are several guidelines to follow. These are:

Partition fact tables

Partition fact tables because partitioned fact tables provide better performance, less performance impact during table management and better control over backup and restore.

Use SWITCH, MERGE, and SPLIT operations for archiving aged data

Use the SWITCH, MERGE, and SPLIT operations to support the archival of aged data by using a sliding window technique.

Use aligned partitioned indexes

Use aligned partitioned indexes. In general, only a portion of the data in a fact table is volatile. Therefore, it is more efficient to perform maintenance on only portion of index that changes.

Use same partition functions and schemes for partitioning tables and indexes

Create your partitioned tables and supporting indexes using the same partition functions and schemes. By doing so will place tables and indexes on the same filegroups.

Match your relational partitions with your multidimensional partitions

Consider your multidimensional structure when you design your relational partitions you achieve the best results when your relational partitions match your multidimensional partitions.