Stop Apologizing for the Size of Your Data!
This morning I was pleasantly surprised to hear that I had been included in the list of the 100 Most Influential in Big Data by Big Data Republic. It was an honor to be part of such an interesting group. Several respected analytics colleagues were also on the list including Gregory Piatetsky, Vincent Granville and Seth Grimes whom I’ve known for quite some time, as well as Carla Gentry and Kirk Borne whom I’ve gotten to know more recently.
With all the talk about Big Data these days it’s getting to the point where, even if you’re working with big data, with all the media mentions and glorification around it, you might start wondering “hey is this really me and my data they’re talking about, it doesn’t sound like it?”, and “maybe I need to be working with even bigger data sets?!”. In fact, partly because of this feeling, just a few weeks ago I had the urge to “pinch/remind myself” of what Big Data really is, defining it here on the blog.
So, now I’m wondering how unique am I - how many others working with Big Data have felt the same way, that the way It’s often referenced and revered makes the term feel, well a bit foreign or aspirational rather than real? And if some of us who regularly work with Big Data feel this way, well how do analysts working with more traditional data sets feel about the term?
You’re probably not alone
Until rather recently I didn’t think of myself as a Big Data professional at all. Back in 2005 when I started Anderson Analytics in order to leverage text analytics in market research, our average data set was usually well below 100,000 records.
Today, while our average client using OdinText probably does have about 100,000 records, and we have a few with over a million, the fact is that we have many more clients with smaller data sets. Several things lave led me to realize that ‘Big Data’ usually isn’t nearly as big as we seem to think it is (and the reason I really prefer the term ‘Mid Data’).
Really big Big Data is still a rarity
Speaking to managers at companies interested in text mining five days a week for several years I’ve realized that in actuality there are relatively few companies out there with those really large data sets. Also, though many fortune 1,000 companies do have these larger data sets somewhere, most of their research directors and analysts actually work with much smaller data sets. The analytical picture is in fact a lot different than Hollywood spy-fi movies, writers at Business Week or AdAge, and a slew of “enterprise software” vendors would have us believe.
This was confirmed yet again last month at the American Marketing Associations first conference on Big Data. I had the chance to answer questions and speak to quite a large number of the other speakers as well as attendees. Many came up to me, almost apologetic at first, saying something like “well this isn’t really a big data question, because our data is a bit on the small side…”.
There’s no doubt that data is increasing in size, and I also believe it’s important to include text/unstructured data in your analysis which further serves to increase data size and complexity. However, don’t worry if your data isn’t over a million customers in size, you’re definitely not an outlier.
While it’s true that some analytics including text analytics doesn’t really make sense until you reach a certain minimum threshold, the worst thing you can do is to combine various smaller or ‘Mid Data’ sources believing that multiple data sets somehow will increase the value of your data exponentially. Before even considering that, always make sure there is value in the independent sources first or you are likely to be disappointed.
Most importantly, Small, Big or Mid Data, it’s not the size of your data that counts, it’s the quality of that data and your analytical prowess and hard work that count!
[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics which develops and sells patent pending data mining and text analytics software platform OdinText]