Uncategorized

Parallel Wrapup

SelecTViews
With Stephen Wynkoop
Today you’re in for a treat as Stephen interviews Kalen Delaney, author of SQL Server 2008 Internals. Of course there is a lot more…Watch the Show

Parallel Wrapup
Our readers have submitted a number of good insights and suggestions for parallel processing. Today we’ll wrap up this particular topic with those comments.

Michael:
Since 25. August 2011, a generic solution approach for parallel processing of data modifications for the SQL Server platform is also available as free download at http://sqlparallelboost.codeplex.com.

Next to the purely T-SQL based sample solution with all source code, the codeplex-site provides documents about the conceptual details.


"Compared to the single-thread approach of SQL Server itself, SQL Parallel Boost facilitates the parallel execution of any data modification operations (UPDATE, INSERT, DELETE) – making best use of all available CPU resources."

David:
What exactly is difficult about parallel programming, besides the historically convoluted APIs for C/C++?

Parallel computing is multiple processors executing multiple instruction sets on memory that may or may not be shared between them. Reasoning about one particular set of instructions for a processor is not hard. It’s also not difficult to think of two computers in a room, one doing raytracing simulations and the other serving up a website, because they’re doing two completely different things and can’t share memory.

When parallel code has to share memory, that’s where things get difficult, because they are operating on the data in no predictable order from the perspective of any one of the instruction sets.

The obvious answer to the problem of parallel programming is to share as little memory as possible. Memory density is still scaling according to Moore’s Law while processing speed is not, and memory, being so highly regular, turns out to be easiest for new semiconductor process technologies to fabricate. (A quick google search tells me that the first 28nm memories were released roughly a year before the first 28nm CPUs and GPUs; 2010 versus 2011.) So why, as a programmer, make your life difficult working on shared memory that doesn’t necessarily need to be shared?

A couple of useful examples, first from a high level and then from a low level.

Node.js has recently popularized the event loop for server-side processing (Python’s Twisted came earlier, but Node.js only works with the event loop unlike Python, so it’s a more pure example), but an event loop is the epitome of single threading (an event queue built up and only one event handled by the code at a time), so how can it be related to parallel programming? Well, similar to parallel programming, you have no idea what will be executed next, though on the granularity of particular events, so the structure of your program is to work on isolated pieces of memory on each event.

If you spawn multiple copies of your Node.js web application, one for each core (except the “last” core), and spawn a load balancer to randomly choose which instance of your web application to pass the request to, you have very cheap, nearly linear horizontal scaling – your application was easily parallelizable because you “know” that one request has nothing to do with another request.

For this to work for a non-trivial web application, though, you must also do the following: all data that must persist between requests must be placed into a database or other form of storage that can be shared by all of your web application instances. We already do that for most of our data; user information that is kept as part of their account or whatnot. However, you also must persist things that are transient but last longer than a single request event, such as a user’s session. (This is a justifiable use of the Key-Value-type NoSQL such as Redis because it’s a simple look-up table with no joining and is optimized for this sort of usage.)

So, the “hard” shared memory problem is pushed off to a database server (or servers if mixing SQL and NoSQL for different tasks that they’re optimized for), while the “easy” independent memory parallel programming is what the developer uses. When it’s hard to find laptops with less than 4GB of RAM nowadays, and with the diverging application of Moore’s Law between CPUs and memory, I don’t see much of a point for developers of a single-task server to spend hours and hours of billable time on reducing the memory consumption of the parallel components when 1GB extra of 1333MHz RAM is just $11. Unless if you’re a contractor, I suppose…

Now, from a low-level perspective, I think there are essentially three types of for loops: 1) Loops where each run of the loop has nothing to do with the previous run of the loop, 2) loops where each run of the loop is directly dependent on the previous run of the loop, and 3) loops that are somewhere in-between.

Type 1:
for(var i = 0; i < arr.length; i++) {
arr[i] = encrypt(arr[i]);
}

This is a single-threaded for loop that encrypts each element of the array; but suppose this encryption takes a while. It would be nice if we could parallelize it. I believe the recent craze over functional programming has some merit. Rather than require the programmer to worry about the intricacies of Win32 or POSIX threading, let the language abstract it with a standard library function or method, such as Array.forEach:

arr = arr.forEach(function(val) {
return encrypt(val);
});


forEach takes a lambda function that it will execute on every array element. The language implementation can make this forEach as simple (reducing it to a bare for loop as above) or as complex (Apple’s Grand Central Dispatch that automatically load-balances the parallel requests amongst the processor cores and amongst all running parallel applications) as desired. It handles the thread blocking and returns the modified array once all threads have finished their processing, and there was no problem with them executing in parallel because they didn’t depend on each other.

Type 3:
var area = 0;
var arr = [0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2];
for(var i = 1; i < arr.length; i++) {
area += Math.sin(arr[i])*(arr[i]-arr[i-1]);
}

Here, each iteration of the loop is dependent on the previous, but it is not altering the original array in any way. If we introduce a reduce method from functional programming, we can write:

var arr = [0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2];
var area = arr.forEach(function(val, i) {
if(i == 0) { return 0; }
return Math.sin(val)*(val-arr[i-1]);
}).reduce(function(val1, val2) {
return val1 + val2;
});

Both the forEach and reduce methods can be parallel because they are not altering the original array in any way, though in this map-reduce pattern, to be more efficient it would be more efficient to have a mapReduce method that can take both functions and start the reduction phase immediately after at least two results are available (where its resulting value is also a result), while this would wait for the new array to be constructed by forEach.


Of course, anyone who remembers basic Calculus and recognized what I was doing would know to reduce this entire algorithm to just:

var area = Math.cos(0) – Math.cos(2);

Doing stupid calculations faster and in parallel is still doing stupid calculations. 😉

Similarly, several sorting algorithms can be parallelized, mergesort being one of the easiest to do conceptually.

Again, for most, I think this is where the majority of gains in parallelizing can be made, and more advanced techniques need a justification (business needs, curiosity with personal projects, etc) for the required effort. Even GPGPU work will largely fall along these lines, only with a massively larger amount of data to process where the delays in transporting data from CPU memory to GPU memory are far less than the total processing time on the CPU.

If you have thoughts you would like to share on this or any other topic, feel free to drop me an Email at btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Understanding the Client Perspective on Defects and Fixes
Laura Lee Rose, business coach and corporate exit strategist, remarks on the different retorts that testers, developers and sales associates return when a defect is reported. She explains how iterative development principles can keep everyone going in the same direction, despite their difference of opinions.