IoT and Data Retention

With makers of home automation devices being pressured to release data that’s collected passively from those devices, it’s clear that data platform folks have their work cut out for them.

Just recently Amazon has agreed to hand over information from their Echo service that may be helpful in a murder investigation. (ref: Gizmodo) I think this is pretty significant. I won’t pretend to be an insider on the case or to know the specifics.

I also am not looking to figure out whether, if facts are known that can help/hurt a case, they should be shared, even if learned from an automation device. (See, I get to ask questions, but can avoid the hard questions.)

This did, however, get me thinking about all of us as data platform folks. The thing that this has me wondering about is where compliance-type things come into play.

Specifically, privacy laws. Data retention. Subpeona-able things.

I’m not saying at all that we’re in the business of hiding information. Quite the opposite, I’m saying that we ARE in the business of making sure information that needs to be retrieved as deemed necessary by the powers that be, can be retrieved.

It’s not just about storage capacity or query strength. I think we have to be careful to support and plan for some sort of structure around information gathered. I know with email, there are laws in place. With medical information, there are laws in place. But what about the comings and goings of your home? Let’s face it, if you have a NEST thermostat and other various automation pieces, along with car automation (many cars know who is driving by their key fob for example), there actually is a good bit of information that could be smooshed up (technical term) and used as evidence or as an alibi.

As data professionals, I think it’s going to be important to help drive these discussions. If you’re not retaining it, it can’t be compelled. But if you’re not retaining it, how do you determine when it’s purged? Are summary records maintained? What information is saved, accessible, archived, deleted?

While these are age-old questions for data people, the specifics shift quite a bit under our collective feet. Some information may actually be more useful in its raw format.

Think about cryogenic freezing of bodies – the hope is that, while the data (the body) isn’t fixable today, it will be in the future. That may be a stretch as an analogy, but the fact remains, I think it’s very important for us to raise our hands in meetings and be critical of plans to manage information flows and storage by accident.

  • Passive data collection is a misnomer. Data is always actively collected. As data professionals, we are the ones that actively make sure of that. The data source, i.e. the user, is sometimes passively sharing data. If the correct terminology was more strongly adhered to perhaps more (data sources)/users would choose to be more protective of their data.