Search SSWUG.ORG:

EDITORIALS

Editorial Rss
12345678910...
TitleDate

Vectors vs SQL




Language features like those in R, which is designed to work with vectors, have often be performed using traditional SQL syntax. While SQL Syntax is not designed for statistical analysis the set nature of the language does lend it to solving some statistical problems. Perhaps this might even be a more efficient solution if large amounts of data must be extracted for vector operations to be performed.

Here are some examples

 
Syntax Comparison of R to TSQL
function R TSQL
Sum of a sequential list of numbers y <- Sum(1:5)
DECLARE @Result INT = 0

SELECT @Result += Number
from Talley
where Number BETWEEN 1 and 5
Get sum of sequence of numbers squared y <- Sum(C(1,2,3,4,5)^ 2)
DECLARE @Result INT = 0

SELECT @Result += Number * 2
FROM Talley
where Number BETWEEN 1 and 5)
 
* The SQL sytax takes advantage of a Tally table containing a sequence of numbers in a single column called Number
 
These two examples are fairly simple. When you start getting into functions for which aggregates are not already defined such as mean, linear regressions, etc. you’re probably better off extracting data and calculating your statistics elsewhere.

You do have the option of developing your own statistical functions using TSQL to calculate values such as mean or median, not already included in the standard SQL Server functions. You can also write your own CLR functions as well. As in anything dealing with SQL Server, one determining factor is often, “what places the least load on your database server.” Is it better to extract the data and calculate elsewhere, or less load to calculate close to the data?

Big BI programs have integrated tools such as R into their stack. Oracle and IBM have integrated R in their BI offerings. R is gaining in popularity and being integrated into most BI implementations from both commercial and open source platforms..

Are you using vector optimized programming languages like R? Or do you prefer to use tools such as Excel or Power Pivot for developing your statistical models? Share your experience here or by Email to btaylor@sswug.org. 

Cheers,

Ben
 
1/22/2015
12345678910...