Data science includes all sorts of new opportunities

In reading through a post on eWeek about why companies aren’t adopting machine learning with their data systems in large numbers yet, it’s clear that data responsibilities are continuing to morph as companies figure out what to do with all of the information (actually, data) now that it’s being captured and retained.

One of the things that has been mentioned in many different articles, posts and discussions is what a project milestone it is to finally define what the data needs to start looking in order for it to become information… in other words, the data chef – those that take the raw data and do the prep work on it to make it consumable for a particular use.  This is a big deal, and it’s extraordinarily complex to make happen both at scale and with the velocity that companies need.

Even more, the new data management regulations mean that working with your information stores and managing them also has implications for compliance for companies of nearly any size.

In working on several different projects to get some analysis and reporting set up for customers, there’s a whole lot of spinning going on when it comes to making choices about data use.  It seems many times that the compliance requirements confuse the functional requirements – in other words, they make them harder to deliver on.  The seeming barriers to implementation are on so many minds.

We’ve been seeing some good results though in sort of splitting the data-out requirements from the compliance during the discovery and initial design elements.  By paying attention to what you need from the systems, then circling back and making sure you can deliver that, rather than designing around compliance, it seems to help alleviate the paralysis.

This doesn’t mean you don’t address issues that you know will arise.  Rather, it means defining what you need, then figuring out how to get there within the guideposts of the project.  We’ve seen where the addition of reporting and query “layers” of information, where sensitive data is summarized in ways that are still productive, rather than details that are sensitive, has helped in many cases.  In others, it’s required that we re-think the information gathered and we’ve needed to pull additional or different information to support our direction.  This might be the case with getting slightly less specific information (reporting on Zip Codes an be as effective as street addresses if you’re looking for trends in an area of cities), or changing how you store it (perhaps information needs to be masked).

I think the roles that are coming into better focus have a lot to do with database management tools on the best platforms to support your data requirements.  This may well be SQL Server on Linux, on Windows or a mix of SQL Server and a NoSQL alternative to process things in a particular way.  These roles of defining the best approaches, data prep, and data delivery at this interim level are both key and something that may prove to be just the ticket for people that really love working in the low-level database capabilities environment.