GDPR: Don’t forget the AI

We’ve been working down through different systems and infrastructure and working on the different components to GDPR and having systems support what’s needed there.  While it’s comprehensive in its stated goals (GDPR), it’s also quite broad and loose in defining how to get there from here.  This has led to some interesting meetings and review sessions while people work to figure out how best to address PII.

In one case, all accounts for a customer are being anonymized outside the core systems.  In other words, if you’re pulling reports, if you’re sharing information between systems, etc. – that data is anonymized first, removing all possible bits of information.  In one case, everything is replaced with a random ID and random password.  This works because the reporting around the accounts is all about summary information, not detailed data.  So, showing that “X operation” happened “Y times” is all they need, not that John Smith did the thing.

However, one of the things we’re going back over now are the systems on both sides of the data processing – IoT devices that are creating the data, collecting the information, data entry systems – all of those front-end systems.  Then we’re doing the same on the outbound data side of things – working hard to make sure any data that isn’t “clean” is both known and managed, and protected and controlled.

Another area that we’ve been seeing people get caught by is their AI work.  It’s pretty common practice, particularly during development and in these relatively early days of working with AI in many companies, to move the information that drives these systems to an outside database.  This means another copy of the data, another access point, larger surface area to protect… all of that.

GDPR considerations also come into place with the data management.  You need to be able to cascade the removal (or scrubbing) of an account across all of these outside data stores and your AI stores have to be included in that.  It’s meant more than a few very complicated changes.  The anonymization of data has helped somewhat in these in several cases.  Being able to continue the machine learning solutions, while still not having some of the details of the data has been challenging, but doable.  Consider anonymizing only the bits that are critical and personal – other elements can remain in some cases.

It’s important to look at the full lifespan of your data.

Tags: ,