JavaScript Patterns
JavaScript Patterns
What, how can there be patterns in a scripting language? That was what I asked myself when I came across the book titled “JavaScript Patterns.” So, I had to look into it. Here are a few bullets from the back cover:
- Create objects that go beyond the basic patterns of using object literals and constructor functions
- Learn the options available for code reuse and inheritance in JavaScript
- Study sample JavaScript approaches to common design patterns such as Singleton, factory, Decorator, and more
Ok, I have to admit I’m more impressed with JavaScript than previously. This book contains a lot of great examples of practical code to get you going using Patterns in JavaScript. Take your JavaScript code to places you never dreamed of.
Private Cloud Question
I received the following question from Paulo. Paulo Writes:
I need to know how to create a DW in a private cloud using Relational databases like SQL server or Oracle.
I want to have the data distributed on multiple database instances of SQL Server sharded on dimensions allowing parallel queries to run, and have a cloud process aggregate or join the final results from each database.
Editor:
Paulo, you have a few options available to you.
- Oracle and Microsoft already have appliance systems made to do this kind of analysis. Microsoft released theirs with SQL Server 2008 R2 as a parallel data warehouse. Oracle has their offering based on their Sun servers, oracle db, and data warehousing engines.
The key thing these appliances offer is automated sharding, distribution of your queries, and consolidation of the results all working seamlessly to already existing BI client utilities. The client tools are not aware of the difference between this system or a single server platform.
These are both expensive solutions made for massive scale work. But the infrastructure works right out of the box.
- There are some third party sharding products that work with different databases. The ones I see the most are for MySQL. These tools expose SQL Queries in a natural form and route the work to the appropriate instance. Steve just did a SSWUGtv interview for this kind of software as a service.
- Another option would be for you to roll your own sharding technology, routing queries to multiple databases, and consolidating the results yourself. This could be done as a web service providing a unified interface to end users.
- SQL Azure could also be used with Federations. SQL Azure automatically shards databases based on your sharding function and routes requests appropriately. However, SQL Azure does not perform any post processing. You must write your own aggregations if results are returned from multiple instances.
- I’m sure there are many more technologies out there of which I am not aware. This is a common problem many have worked hard to solve.
If you are interested in further research I suggest a google search with terms like Parallel Data Warehouse. Before you go to google, be sure you have watched the last two SSWUGtv episodes, Cloud implementations, technologies, including software (specifically databases) and should you be using PowerPivot? Dan Clark talks about his thoughts. Is it the tool for your self-service BI?
If you'd like to share your thoughts drop an email to me at btaylor@sswug.org.
Cheers,
Ben
Featured Article(s)
Troubleshooting problems with SQL Server 2008 R2 Integration Services
In this article, Alexander Chigrik explains some problems that you can have when you work with SQL Server 2008 R2 Integration Services.
Featured White Paper(s)
All-At-Once Operations
Written by Itzik Ben-Gan with SolidQ
SQL supports a concept called all-at-onc... (read more)
|
5/16/2012
|
CLR Capability in SQL
SSWUGtv
With Stephen Wynkoop
In todays issue of SSWUGtv Steve interviews Dan Clark as he shares his thoughts bout self-service BI. Is PowerPivt the tool you should be using? Find the answers here on this edition of SSWUGtv. Watch the Show
CLR Capability in SQL
We have been discussing the value of the CLR in SQL Server for the last couple of editorials. Today we have a response to my thought that the CLR can be an asset for distributed processing taking advantage of the sharding technique built into SQL Azure federations.
Veska Writes:
I both agree and disagree with your comments on CLR.
- I do not think it is the job of code to navigate the database for the information it needs. SQL can do this beautifully. Most commercial programming consists of screens + database inputs and outputs if you keep the two separate, i.e. the screens do not need to know the structure of the database and the database just receives the information collected by the screens and returns any required information doing all the necessary navigation of tables. This gives you full freedom to design, tune and keep the database as you need it for either commercial of performance reasons.
- I have not found anything that you cannot code in SQL , except for someone deciding to store information in a blob, that only a piece of code can decode.
Thanks Veska. I think you pretty well clarify why I am ambivalent to the use of CLR capabilities in SQL Server. There are many situations where if TSQL is adequate, then use it. Even when TSQL is not adequate, I am not always convinced that the database engine is the place to put procedural code. One key factor for me is determining what places the least load on the database engine. If it is less work to do something in the database than to return data to be processed elsewhere, then maybe you should consider doing the work in the database engine.
Lately, as I have been reviewing Big Data techniques, I am disappointed that the engines surrounding BIG Data always assume it must be un-structured, and push stuff into storage that may not be as optimized as a relational database. Try and find a map reduce capability built into a sharded relational database structure. I’m thinking that is what SQL Azure federations could be if it included CLR support.
Who knows? Perhaps that is just too academic. Well, tomorrow we move on to new territory with patterns in JavaScript. Drop me a note with your comments at btaylor@sswug.org.
Cheers,
Ben
Featured White Paper(s)
Encryption & Key Management for Microsoft SQL Server 2008
Written by Townsend Security
Simplify encryption and key management on ... (read more)
Featured Script
Create_Add_Proc
This procedure will look to the systems tables and create a default add procedure for a table... (read more)
|
5/15/2012
|
Should Programs be separated from Data?
SSWUGtv
with Stephen Wynkoop
Can your MySQL database scale? If not, what are your options? Do you need to purchase a bigger machine? Do you need to purchase a sharding wrapper, or write your own data broker allowing your databases to shard? Instead, you now have the option to take advantage of the Cloud Database As a Service from Xeround. Watch this interview with Xeround.
Watch the Show
Should Programs be separated from Data?
I chose this topic today as a follow on to the question discussed last week about the value of the CLR in SQL Server; specifically, why CLR support is excluded from SQL Azure. A lot of factors contribute to the answer to this question. David writes in with a well thought out answer to this question.
The most efficient technique to process data on any scale is to keep the volume of data to be processed from growing. Google has proven it is more efficient to pass algorithms around to smaller data sets, and bring the results back together than it is to gather data into parallel processes pulling data from remote data stores.
SQL Azure, as currently implemented, has the ability to scale data through federations as a sort of sharding. Wow! What if it also supported the CLR in these federations? Perhaps you would see more Map/Reduce types of processes, no longer restricted to un-structured data.
David Writes:
Since you're asking for opinion, here goes. :)
There are a two fundamental units of measure dealing with a unit of data: it's size and it's speed of retrieval -- bytes and latency.
But what *it* is also has two definitions: the final product of the piece of data, and all of the inputs necessary to produce that final product.
Compression algorithms take the final piece of data, and extract only those necessary pieces of data required to reconstruct the original piece of data according to a set of rules, working under the assumption that the latency of downloading those smaller pieces and the latency in the CPU in reconstituting them is less than the latency for downloading the original form.
Databases work under a similar principle that it makes more sense to have intelligence where the data is stored so that only the pieces of the database necessary to construct a particular output are transferred. In this sense, the application reconstructs all possible states of the application output when required, rather than requiring the pre-calculation of all possible inputs and outputs (probably an infinite set, anyways) and the storage of all of these outputs.
So, traditionally, the required calculations have been placed in the application layer, while only the retrieval logic (WHERE clauses) resides in the database -- and therefore the development of SQL as (originally, at least) a non-Turing-complete declarative language, so you don't have to deal with the halting problem (and therefore accidentally providing the SQL server a command that can *never* finish).
But this is simply a set of trade-offs, and in modern computer architecture (especially when dealing with clusters/clouds of computers) there are multiple "stops" for the data, each with their own storage maximums and latencies -- generally the "deeper" you go, the lower the latency and smaller the storage space (L1 cache, L2 cache, RAM, Disk, Remote Networked Storage). This also means the further away your data is, the slower your processing goes (assuming a naive algorithm that just asks for the next piece of data when it absolutely needs it and doesn't cache [which is bringing the data closer]). That's why SQL has a WHERE clause -- give the database a simple definition of what's not desirable in the result set so you don't have to spend time filtering it out on your side, paying the higher latency penalty.
The reason why SQL is a declarative language rather than an imperative or functional language, beyond the halting problem, is because the majority of your data retrieval problems can be solved with the declarative filters and they can be optimized in the general sense for exactly the kind of storage system you're working with (I didn't bring up things like hard disks which have poor "random" access [getting the first byte] but high "sequential" access [reading the bytes that follow that first byte] that complicate algorithms and storage patterns that most programmers don't want to think about).
But there is that small number of queries that don't really fit the mold (and may not be able to, if a compact index cannot be defined), and as the application gets further and further abstracted from its database across networks, and that database gets subdivided into slower and slower channels (a cluster that has to coordinate with each other on retrieval adds more hierarchy and new chokepoints -- done only because the underlying computing architecture is hitting a brick wall in raw-performance), the more these queries start to matter, and the more queries fall under this umbrella as chokepoints in the database architecture make certain kinds of joins or filters or sorts expensive as they have to cross over slower communication channels.
The basic idea behind Map-Reduce style database queries is this: you need to perform a calculation using a very large amount of data, but the specific calculations involving any one piece of data are small, and the results of these individual calculations can be rolled into the final result in a simple manner, so you can gain not only by reducing the amount of data transferred, but also reducing the calculation latency by using all of those database server CPUs in parallel.
This seems very restricted, but when you think about it, all an Index is is a pre-computed Map of which columns of data match a particular value, and your WHERE clause is like a Reduce, reading that Index and selecting only those that meet the range or equality, etc specified. Some of these filters can be easily applied in parallel, and belong at the base-server level.
Some of them cannot be easily indexed across a cloud of database servers, and they should be applied *after* the other queries. These can still be done by the database server easily within the declarative SQL syntax.
But other filters still cannot be indexed at all, and need Turing-complete calculation. These have traditionally resided within the application, but where they belong is governed simply by the following equation:
tDelta = tDbFullDatasetTransfer + tAppFiltering - tDbFiltering - tDbPartialDatasetTransfer
The time to transfer the full dataset and the time for the application to filter it minus the time for the database to filter it and transfer the reduced set of data. If that delta is positive, then you gain by executing Turing-complete code on the database-side, negative then you lose. It doesn't matter if you're talking about compressing/decompressing data, encrypting/decrypting, filtering out unwanted data, or sorting the data according to a custom equation, they're all data manipulations that apply.
Being able to run .NET code lets you make that choice, not being able to prevents you from making that choice, so I'm of the opinion that not allowing such code on Azure is *a particularly bad idea for the very concept of SQL Azure*. If they're worried about a user abusing shared resources, the Windows kernel should be able to ratelimit users based on a rolling average of CPU and I/O load (to allow bursts but not allow constant hogging of resources) since each instance *should* be run in a separate (possibly temporarily instantiated) user account so security vulnerabilities are less likely to let data leak.
To *not* allow this means that you have to transfer more data than necessary to/from your cloud, costing you more money and making the cloud-based offering less attractive.
Now, if you'll excuse me, I'm going to ponder over a possible query syntax that can be turing complete and can also determine where in your "stack" the actual filtering code should be run so you don't have to worry about it from either the application or database side..
Send an Email to btaylor@sswug.org if you have other thoughts on this issue.
Cheers,
Ben
Featured White Paper(s)
Encryption & Key Management for Microsoft SQL Server 2008
Written by Townsend Security
Simplify encryption and key management on ... (read more)
|
5/14/2012
|
Final CLR Response
Final CLR Response
Today one of our readers digs into a topic I connect with every once in a while regarding the role of your database server. The key question is, “what kinds of roles do you want your database server performing?” As I work with larger and larger databases, my tendency is to put the least amount of extraneous work on the database server as possible.
That being said, I have had to do things I would not prefer simply because some companies have no other place to do it. For example, on one system I used SQL CLR function to compress and decompress blob data simply because there was no other place available to do the work, and the database was the only centralized place where compression could be implemented. Ok, it was a bad solution for a bad architecture…but it was the correct decision for that specific situation.
Let’s pretend it is a perfect world, and you are designing your systems to scale out, not scale up. That being said, what limits would you prefer to place on your database server, and even more specifically, your database engine?
Dirk Writes:
Personally, I feel that Microsoft should strongly consider adding this feature to SQL Azure.
Even if their system is not an open source system, this backdoor has offered us, IT-professionals, a chance to enhance the DB-servers capabilities considerably. In my company, most of our add-ons (like functionality for Regular expressions, string-manipulation, XML/XSL transformations, HTML-decoding, …) in one central DB to be used by all the other applications as they see fit and it has saved us quite some workarounds and ensured that my team of data architects can easily intervene when clean ups of data are needed, directly from SSMS or SSIS packages.
Now, concerning those who have warned us not to use it, the eternal question remains:
Should you use your DB-management system as a dumpster with some indexes and keep out all intelligence in the application-code?
Or should you make use of the features it offers and for which you (or your customers) have paid anyway?
For a software vendor, it makes sense to keep the data and the logic separate to ensure portability, but I feel that a lot of companies are not prone to easily throw away their infra-structure and to "just" change to another system.
Sure, this is a somewhat controversial topic…at least people get heated up about it, or have very strong opinions. So, why not share your opinion, strong or not, by sending you thoughts to btaylor@sswug.org.
Cheers,
Ben
Featured Article(s)
Installing SQL Server Express 2012 and managing the LocalDB
This article discusses downloading and installing the 64 bit version of the SQL Server Express 2012 on a (x64 bit), Windows 7 Home Premium on a Toshiba Satellite P775 and administering LocalDB using Sqllocaldb utility.
Featured White Paper(s)
SharePoint 2010 Enables the Enterprise
Written by KnowledgeLake
read more)
|
5/11/2012
|
What to do with SQL CLR
There are a lot of things much easier, or more efficient, when done in SQL Server using the CLR.
Here are some of the ones I have done.
Most of these were done before SQL Server 2008. Some are redundant with features available in SQL Server 2008, but only ship with the Enterprise or higher version. Some could have been done in TSQL; others were done in the CLR for performance reasons, primarily string based work.
- Calculate Erlang. An Erlang is a measurement for the number of phone lines a company should have statistically to reduce the probability of an incoming call receiving a busy signal
- Median aggregate
- Convert character data to a comma separated list
- Convert comma separated list in a string into a table
- Compress/Decompress data
- Encrypt/Decrypt data
- Format Dates Funtion
- User Defined Types
- Date (no time segment)
- Small Date (No time segment)
As you can see I don’t use the CLR a lot, since this is my comprehensive list since CLR code was released to SQL Server in 2005.
How about you? What CLR routines have you created that are not specific to your data alone, working generically?
Why not drop me a note with useful ideas for the CLR in SQL Server.
Cheers,
Ben
SSWUGtv
With Stephen Wynkoop
Did you catch this episode yesterday as Steve takes on the difficult task of cleaning data. No? Then now you can watch as Steve interviews key players in the data cleansing world.
Watch the Show
Featured White Paper(s)
All-At-Once Operations
Written by Itzik Ben-Gan with SolidQ
SQL supports a concept called all-at-onc... (read more)
Featured Script
dba3_DeleteTopByIncrements_demo
Requires TOP (variable) support of 2k5 & later - demo script... (read more)
|
5/10/2012
|
|