Editorial Rss

Joining Data from Disparate Sources

Moving away from rational databases as a data storage technique may have its own set of problems to solve. For example, what do you do when you have two collections of data with a common attribute (like a foreign key in a relational database) on which you join the two sets, resulting in a new set of data with attributes from each? How is this accomplished?

In Dot Net this is a simple problem easily solved by using a linq command which has relational like capabilities to join the two sets? In some other set based languages this is also not a difficult problem. But when you are working with tools like Java or JavaScript, now what do you do?

You can roll your own join code for the two sets using loops. This takes a while to code, and it is pretty easy to get it wrong without knowing it. It is also difficult to get optimal performance.

Now that we have so many data engines available to us it seems like there would be a growing need to do set like activities without set like tools, especially when data comes from two completely different kinds of data storage. For example, it seems like you may find yourself joining sets from a Mongo db collection and a result set from a SQL engine. Perhaps you need to join data from an Oracle query and a SQL Server query. More and more this is an activity that is no longer possible using techniques such as a linked server in SQL Server which translates a foreign collection into a TSQL accessible resource.

Is this a real world problem you are facing today? Do you anticipate an increasing need to solve this problem? What tools do you use to join disparate collections? Share your thoughts here or drop an email to