December 7th, 2017

No, you shouldn’t keep all that data forever

Big Data, Data Analytics, Data Storage and Management, others, Programing, by admin.

Modern ethos is that all data is valuable, should be stored forever, and that machine learning will one day magically find the value of it. You’ve probably seen that EMC picture about how there will be 44 zettabytes of data by 2020? Remember how everyone had Fitbits and Jawbone Ups for about a minute? Now Jawbone is out of business. Have you considered this “all data is valuable” fad might be the corporate equivalent? Maybe we shouldn’t take a data storage company’s word on it that we should store all data and never delete anything.

Back in the early days of the web it was said that the main reasons people went there were for porn, jobs, or cat pictures. If we download all of those cat pictures and run a machine learning algorithm on them, we can possibly determine the most popular colors of cats, the most popular breeds of cats, and the fact that people really like their cats. But we don’t need to do this—because we already know these things. Type any of those three things into Google and you’ll find the answer. Also, with all due respect to cat owners, this isn’t terribly important data.

Your company has a lot of proverbial cat pictures. It doesn’t matter what your policy and procedures for inventory retention were in 1999. Any legal issues you had reason to store back then have passed the statute of limitation. There isn’t anything conceivable that you could glean from that old data that could not be gleaned from any of the more recent revisions.

Machine learning or AI isn’t going to tell you anything interesting about any of your 1999 policies and procedures for inventory retention. It might even be sort of a type of “dark data,” because your search tool probably boosts everything else above it, so unless someone queries for “inventory retention procedure for 1999,” it isn’t going to come up.

Back Top

Leave a Reply