More on Data Management and Protection

Some good feedback from readers about schema changes and requirements because of data protection.  True that there are many things that we’ll have to figure out – from what types of tools and interfaces to offer to the owners of the information to auditing and proof that you’re managing it all.

One interesting point yesterday was what happens if a customer opts out, then back in.  What are their expectations going to be in terms of history, possible previous orders, etc.?  I think there aren’t many choices on this – if you make prior history suddenly “come back alive”, then to me at least, it’s clear it wasn’t removed and I would think you’d be asking for all sorts of trouble and issues.  I think that it will depend a lot on your own application specifics – there is a legitimate need for some information if you’re capturing orders and such.  In cases where you need to capture those information bits, I think there is a science that will need to evolve that determines how to scrub data without making it useful.

It’s going to be very challenging, no doubt.

You may recall where, several years back, someone was able to use google search history to determine where someone lived, and many other attributes about their lives.  Given that we need to protect identity, I think we’ll have to do some things very creatively.

Timed-archive offline – this would automatically move information offline after a given period of time – this might be order information or other details, but get it off the main systems and perhaps summarize the information in the database so it can be reported on, but not at detail levels.

Time-masking – this would entail masking bits of data after a period of time.  If you have a 90-day warranty, for example, you could mask off purchaser information after, say, 120 days.  Again, you may want to still offer reporting and other options, but by outright masking and removing personal information, you protect that information.

Automated information deletion – like the name suggests, outright delete customer details after a period of time.  This makes me nervous, but I think this combines well with the other ideas.  Summarize it to a reporting-oriented database or table or data warehouse, then get rid of it.  Sort of a “shredder” for customer information.  You could still do things like storing zip codes and such for reporting.

I think these are a start, but interestingly, these will all also lead to a need to be more sure customers are aware of your policies.  Not because you want to show how great you are with their information, but for transparency about the fact that you’re not going to keep it around.  Many times customers simply assume you’ll have their information at hand when they return.  I suspect that won’t be possible going forward after some predetermined period of time.  The risk of exposure will simply be too great.

Many cloud providers have tiered storage and mechanisms to move information from one tier to the next as information ages.  I think taking the same approach and incrementally scrubbing information is going to be the name of the game.  It’ll limit risk, limit liability, and protect information.  We’ll have to be careful to educate our end-users about the changes in information availability though, otherwise the backlash would quickly slip into “can’t win” territory.