ARTICLES

Home  > Articles  >  Next Generation Data Warehousing in SQL Server 2008 (Part 1 of 2)
Next Generation Data Warehousing in SQL Server 2008 (Part 1 of 2)
By: Eric Hanson

A Little About Me And SQL Server 2008

Hello everybody. My name is Eric Hanson and I'm the lead program manager for Query Processing in the SQL Server Engine Team. And today I'm going to talk to you about next generation data warehousing in the SQL Server 2008 release. Before I talk about the new things we've added in SQL Server 2008 let me recap what was in SQL Server 2005 for data warehousing. Again in SQL Server 2005 we have a comprehensive data warehouse and VI platform. We have a data warehouse DMS a SQL Server product. An ETL engine in SQL Server integration services. OHOP with SQL Server analysis services and we also have SQL Server reporting services for reporting.
 
Covering All Components

This is a complete suite of VI products that essentially can meet all the needs of a data warehouse and VI environment. Specifically, within the relational data warehouse component in SQL Server - SQL Server 2005 introduced partitioning, online operations, very good shared memory multiprocessor performance, parallel performance, which we've actually had since SQL Server 7.0 and it's been improved in each release after that. And also a scalability share database feature which is a read only query scale out feature. Probably the most important feature for data warehousing that was introduced was the partitioning feature which allows you to break down very large tables into manageable sized chunks to do things like back up and restore and index rebuilt and so forth.
 

And bulk loading, bulk purging of data, et cetera. In SQL Server 2008, the focus, the main focus of this release was data warehouse scale. Wanted to make sure that we could scale to meet the needs of a largest enterprises for their data warehouse work loads.
 

We've covered all the components of the system. Not just the relational database management system but also our ETL system, our OHOP system, and our reporting system. In order to really verify that everything worked well together at high scale, we did some large-scale integration testing up to a 25 billion row fact table and associated dimension tables and dimensionally modeled data warehouse on a 32 core HPC superdome.
 

The goals we had were to make SQL Server scale comfortably to handle the large enterprise data requirements. Now we wanted to meet the scalability needs of all but the very largest data warehouse customers and eliminate weak links and scalability across all the different tiers of your data warehouse environment. I'm going to focus mainly today on enhancements we've made to the SQL Server engine for data warehouse scalability in SQL Server 2008. Starting with faster query then moving on to faster and easier ETL or ETL is extract transform load and also touching on faster and lower cost system management. With respect to query we have improved the system in the following ways.
 
Just A Couple Things That Are New To SQL Server 2008

We've introduced data compression which speeds up core processing performance significantly - I'll talk more about that later. A resource governor feature. Partition table parallelism so we have better parallelism when you're operating on partition tables in your queries. A star join query processing enhancement so if your queries contain star join constructs, those queries were typically execute significantly faster. Grouping sets which is a new program ability feature in SQL Server 2008 that lets you express certain types of aggregation queries more naturally and compactly and in way in which they run faster. And also a partition lined indexed views feature which allows you to more easily manage summary aggregates that are associated with partition tables.
 

And with respect to ETL we've done other work for ETL in integration services. I'm a mainly talk about three features are related to ETL that are part of the SQL Server engine. One is a merge statement - a new T-SQL statement that lets you update and delete a row in a target table in a single statement. Very popular for loading data warehouses and modifying dimension tables in a data warehouse. Change data capture which is a feature that lets you capture changes to a ________ database and help migrate that into a data warehouse. And also a minimally logged insert feature and finally we'll talk about backup compression and the capabilities we have there.
 

You Did What With Data Compression?

Let me start first with data compression. Key goal we had to introducing a data compression feature in SQL Server 2008 was to shrink the size of the fact tables in your data warehouse. The fact tables are typically by far the largest tables - very large collections of event data. Very structured regular data. Typically there is a lot of redundancy, so they're very good candidates to compress so a secondary goal of compression was to speed up query processing performance as a by product of shrinking the fact tables and we were able to do this by reducing the amount of IO we have to do to process a query because there are fewer pages on the disk to read. And also we increased the buffer hit rates, again that's going to reduce IO.
 

We've observed for actual customer data for a factor of two to seven compression, so with more typical is about a factor of three maybe slightly under a factor of three like two and three quarters compression. This works with other features. It works in a way in which is orthogonal to the other feature. You can use it with just about any other feature of SQL Server. Works for both data and indexes and it's based on several different techniques. The most important one is page level dictionarian encoding. And then prefix coding and variable length in coding of fixed length fields like integers and dates. And also just avoiding storage of common values like zeros and nulls.
 

First let’s talk about variable with encoding. It's very straightforward concept. If you have a fourbite integer that contains the number one it looks like that. It's got four different pieces - four different bites. The first three are zero the last one is one. Now when compression is enabled we'll store that in a format where we just store the non-zero trailing - the trailing non-zero bytes in a length code that says how many total bites there are. Another part of compression is page level dictionary. The idea here is that there is a small symbol table or lookup table at the beginning of each page that contains the commonly used values on that page.
 

And then down in the rows themselves instead of actually repeating the values and storing the full contents of the values, we can store a reference - basically an array offset into that lookup table at the beginning of the page. Another part of compression is the form of prefix coding. The basic idea is for each column, we'll pick out a long common prefix, the longest common prefix we can find among all values for that page within that column. We'll store that on a special header record on the page and then, for example, if you have a whole bunch of URLs on a column on a page and they're the same except for some trailing characters, you store the common prefix ones and only store the trailing character as you go down the page.
 

In combination these three techniques have allowed us to achieve two to seven times compression for real customer data and we checked this with a large number of different customer databases and the typical compression ratio is about three. Moving on from data compression, we're going to talk about backup compression. Data compression actually shrinks the data that's in the database. Each page is shrunk, so if you had say 50 records per page before you might now have 150 records per page now. And so the number of records that could be cashed in the same amount of memory in the main memory buffer pool is going to increase, should reduce your IO to read the data the first time and you'll make better use of your available memory and have to do fewer fetches as time goes on.

Speaking Of Compression...What About Backup Compression?

Backup compression is something that you just do during backup and it shrinks the size of your backups and it speeds up your backups. It’s independent, backup compression is independent of data compression. Here's some statistics that show how backup compression performs. The size of backups is typically about one-fifth of the original size when you compress the backups. The time it takes to do the backups also is cut to be about a little more than a third of the original time. And restore time also improves as you can see by this chart here. The reason the backup time goes down is that the bulk of the time to do the back up is actually the time to write the backup out to the backup IO devices. And because the backup is so much smaller it takes that much less time to write it. Restore time also faster because again the bulk of the time is the time to read the data, so reading the backup from the physical media that contains the backup is going to take less time because it just occupies that much less space.

A Little About Partition Table Parallelism

Moving on now to some query processing performance improvements that are important in SQL Server 2008. First I want to tell you about partition table parallelism. In the SQL Server 2005 release, we noticed that we had some problems with parallel core reprocessing performance on partition tables. In some situations when a query is touching more than one partition in SQL Server 2005, you only get one thread per partition.
 

That limits parallelism on a large parallel machine. If you have an eight-core machine and the query only touches two partitions, you're only going to get two threads or two of your cores working on that query at the same time so you're only going to be utilizing a 25 percent of your hardware effectively. We wanted to make sure they could utilize all your hardware effectively. In SQL Server 2008 we've done is to allow multiple threads to work on each partition instead of just one. for example now if a query touched two partitions like P3 and P4 there on the example, four threads on an acorn machine typically - if four threads - excuse me, if eight total threads were allocated to do this scan operation, four of them would be given to partition three and four to partition four to balance the work.
 

This can result in very large improvements in query performance. Here's some results from the partition table parallelism improvement on a 64 way machine. Here's a very simple query that says account star from the fact table where the date is between two specific values, two constant values. This query was designed to just touch two partitions. This query speeded up by a factor of 16 on a 64 way machine based on this improvement that we did. And actually I'm going to - I have a video demo I want to show you. First I'm going to show you how this looked on this 64 way machine on SQL Server 2005. Okay, the query is just about to start down here.
 

Right, if you look at the 64 little boxes up at the top, you'll see that only one of them has a little graph showing that only one of the - one of the cores is busy. Okay, so it finished. It took 16 seconds to run that query. Now what I'm going to do is I'm going to run the same query on the same machine on SQL Server 2008 in an internal build that we were working with a few months ago and show you how that looks, how fast that runs, and how work gets allocated to different CPUs - different processor cores. Okay, the core is about to start. Okay, it's finished. It's done already and if you look at the little boxes up above it may be a little difficult to see, but there's a little tick mark in every single one of the boxes up above. There are 64 boxes, one for each core that indicates that every single one of the cores was doing useful work to process that query.


Bettering Star Join Queries And What This Means To You

All right, that concludes the discussion of the partition table parallelism improvement. Now let me talk about another improvement that we've done to make star join queries execute significantly faster. We've actually had some star join optimizations in SQL Server for a couple of releases now. We've made it significantly better in this release. If you look at the diagram there on the slide, you'll see how we can recognize the star pattern in the join graph for a query. We've improved the way that we can recognize that pattern more - made it possible for us to recognize that pattern in more situations and also enhanced the different kinds of plans we can generate for that star query pattern.
 

We also added what we called a multiple bitmap filter core execution strategy for star joins. I'm not going to talk in too much detail about exactly how that star join core execution works, but I will give you a summary of some of the results that we've seen. We've seen for several internal star join heavy query workloads a performance improvement of about 15 to 30 percent across the entire query workload as a result of this one improvement to the product. It’s a very significant improvement. This is on the same data, same hardware. The only change. And we've also seen that some queries also produce spectacular performance improvements because not only do they get better raw query processing horsepower, but they also get a better query plan. The query optimizer was also changed to some extent as part of this improvement.
 
Start Join Queries In The Real World

We have an interesting story where we had a partner called Galleria come in to our customer lab and they had a star join query work load that with that one particular query which took most of the time and we had some expert people in our customer advisory team go in and for SQL Server 2005 Service Pack 2, they tuned that query by hand by rewriting the query and adding some query hints and they got it to - it use to take something like 30 minutes. They got it to run in about just something like 35 seconds and with this new improvement star join in SQL Server 2008. We just attached the same database to SQL Server 2008.
 

The query ran out of the box without any hints or manual tuning in about 25 seconds. It actually ran faster than the tuned version on SQL Server 2005. Star join can help you get better query plans in some situations and even when you get more less the same plan you got before it's going to be significantly faster in many cases just because of more efficient core execution. The next improvement I want to talk about is what we call the few outer rows improvement. There was situations in SQL Server 2005 and earlier where we wouldn't use as many threads as we wanted in a nested loop joining situation. Some of you may already know that in SQL Server we have nested loop joining - for each thread on the outer side of a nested loop join, the inner side of the nested loop join runs in serial for that thread.
 

The inner side of a nested loop join is not parallelized in SQL Server. We are seeing is that the nested loop join strategy that we used often gives us an entire page of rows from the outer side of the nested join to a single thread. And in a typical data warehouse environment you might for example be doing a range query where you're just trying to pick out seven days worth of data by filtering on your date dimension. it might be that all seven of those rows for all seven of those different days from date dimension would be given to a single thread and so that would single thread this query. In SQL Server 2008, we've introduced what we call a new exchange operator on the outer side of nested loop join and that will make sure that each row on the outer side is given to a thread.

 
In this case you might see a seven fold speed up of this query compared with SQL Server 2005 because it's touching seven days worth of data so it'll be a nested join typically with seven rows on the outer side. And this improvement on, we've got an internal data warehouse scale benchmark that we run periodically on machines on the 16 core and 32 core machines. We've seen this one improvement speed up performance by about 30 percent for the entire workload. This is a very significant improvement. to be frank, we don't want to pound our chest and brag about this improvement too much because well it sounds almost like a bug fix; however, we have done a lot of internal scale testing and I want to point out the benefit of that testing.
 

The benefit of that testing is we learned how important this issue was. That really identified this issue for us and helped us go invest the time to fix it. That you can benefit from it. Again I'd like to emphasize the benefit of that internal scale testing that we've done and to helping make the product more efficient.


----------- 
This article was taken from the session presentation at the SSWUG.ORG virtual conference. Please be sure to see the other parts of this article, if any, linked at the top of this article. For additional information about the SSWUG.ORG virtual conference and expos, please see http://www.sswug.org/uvc